Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove Bash from the process of building the site #623

Closed
wants to merge 1 commit into from

Conversation

Bowrna
Copy link

@Bowrna Bowrna commented Jul 4, 2022

closes: #338

@Bowrna Bowrna force-pushed the site_bash_changes branch 3 times, most recently from ba3c5da to a6dab16 Compare July 10, 2022 04:15
site.py Outdated Show resolved Hide resolved
site.py Outdated Show resolved Hide resolved
@Bowrna Bowrna force-pushed the site_bash_changes branch 2 times, most recently from 8c554fd to f5fcb3e Compare July 17, 2022 12:41
@Bowrna
Copy link
Author

Bowrna commented Jul 21, 2022

airflow-site/site.sh

Lines 77 to 114 in 988ae48

function relativepath {
source=$1
target=$2
common_part=$source # for now
result="" # for now
while [[ "${target#"$common_part"}" == "${target}" ]]; do
# no match, means that candidate common part is not correct
# go up one level (reduce common part)
common_part="$(dirname "$common_part")"
# and record that we went back, with correct / handling
if [[ -z $result ]]; then
result=".."
else
result="../$result"
fi
done
if [[ $common_part == "/" ]]; then
# special case for root (no common path)
result="$result/"
fi
# since we now have identified the common part,
# compute the non-common part
forward_part="${target#"$common_part"}"
# and now stick all parts together
if [[ -n $result ]] && [[ -n $forward_part ]]; then
result="$result$forward_part"
elif [[ -n $forward_part ]]; then
# extra slash removal
result="${forward_part:1}"
fi
echo "$result"
}

Can someone explain to me what's happening in this code part? i find it difficult to understand what's happening here.

@mik-laj could you help me?

@mik-laj
Copy link
Member

mik-laj commented Jul 21, 2022

@Bowrna see: https://stackoverflow.com/a/12498485 This generates a relative path for use in a container.

@Bowrna
Copy link
Author

Bowrna commented Jul 22, 2022

@Bowrna see: https://stackoverflow.com/a/12498485 This generates a relative path for use in a container.

Could you tell me why we need to convert the absolute path into relative path to run the script? Wouldn't it run fine with the absolute path? @mik-laj

@potiuk
Copy link
Member

potiuk commented Jul 22, 2022

Because in container it will be in a different location ("/opt/airlfow/.....") rather than <YOUR LOCAL AIRLFOW SOURCES>/....

@Bowrna
Copy link
Author

Bowrna commented Jul 22, 2022

Ah I see @potiuk. As the sh file is replaced to run without docker dependency, i forgot about this point. Thanks

@Bowrna
Copy link
Author

Bowrna commented Jul 24, 2022

create_redirect "dist/docs/${package_name}/index.html" "/docs/${package_name}/stable/index.html"

Is this redirect still valid even after removing the docker part @mik-laj @potiuk @turbaszek @rossturk
Could anyone help me to understand?

I have this doubt because we are redirecting to the path "/docs/${package_name}/stable/index.html". So I wanted to ensure if this is still valid even after removing the docker part for building site. Thanks

@Bowrna Bowrna marked this pull request as ready for review July 24, 2022 11:54
@potiuk
Copy link
Member

potiuk commented Jul 25, 2022

Is this redirect still valid even after removing the docker part @mik-laj @potiuk @turbaszek @rossturk

Nope. Not needed. I think it was not even needed before for a long time already. This is likely a remnant of a very old approach - long time defunct.

I think what is important is that after buildng the site locallly, you can start the server and locally browse the site, also a check that after deploying to airflow site it still works should happen, but this is something we can likely do only after we merge it.

@Bowrna
Copy link
Author

Bowrna commented Jul 26, 2022

@potiuk I get the following error when building the site and I am not sure how I can solve this. This occurs when hugo command runs under the hood. This happens when running the existing site.sh script too

Start building sites …
hugo v0.101.0+extended darwin/arm64 BuildDate=unknown
INFO 2022/07/26 19:15:23 syncing static files to /
WARN 2022/07/26 19:15:23 found no layout file for "HTML" for layout "search" for kind "page": You should create a template file which matches Hugo Layouts Lookup Rules for this combination.
WARN 2022/07/26 19:15:23 found no layout file for "HTML" for kind "section": You should create a template file which matches Hugo Layouts Lookup Rules for this combination.
WARN 2022/07/26 19:15:23 found no layout file for "HTML" for kind "section": You should create a template file which matches Hugo Layouts Lookup Rules for this combination.
WARN 2022/07/26 19:15:23 found no layout file for "HTML" for kind "taxonomy": You should create a template file which matches Hugo Layouts Lookup Rules for this combination.
WARN 2022/07/26 19:15:23 found no layout file for "HTML" for kind "taxonomy": You should create a template file which matches Hugo Layouts Lookup Rules for this combination.
ERROR 2022/07/26 19:15:23 render of "page" failed: "/Users/sathishkannan/code/airflow-site/landing-pages/site/layouts/blog/baseof.html:23:7": execute of template failed: template: blog/single.html:23:7: executing "blog/single.html" at <partial "head.html" .>: error calling partial: partial "head.html" not found
ERROR 2022/07/26 19:15:23 render of "page" failed: "/Users/sathishkannan/code/airflow-site/landing-pages/site/layouts/blog/baseof.html:23:7": execute of template failed: template: blog/single.html:23:7: executing "blog/single.html" at <partial "head.html" .>: error calling partial: partial "head.html" not found
ERROR 2022/07/26 19:15:23 render of "page" failed: "/Users/sathishkannan/code/airflow-site/landing-pages/site/layouts/blog/baseof.html:23:7": execute of template failed: template: blog/single.html:23:7: executing "blog/single.html" at <partial "head.html" .>: error calling partial: partial "head.html" not found
ERROR 2022/07/26 19:15:23 render of "page" failed: "/Users/sathishkannan/code/airflow-site/landing-pages/site/layouts/blog/baseof.html:23:7": execute of template failed: template: blog/single.html:23:7: executing "blog/single.html" at <partial "head.html" .>: error calling partial: partial "head.html" not found
Error: Error building site: failed to render pages: render of "page" failed: "/Users/sathishkannan/code/airflow-site/landing-pages/site/layouts/blog/baseof.html:23:7": execute of template failed: template: blog/single.html:23:7: executing "blog/single.html" at <partial "head.html" .>: error calling partial: partial "head.html" not found
Total in 825 ms
error Command failed with exit code 255.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
error Command failed with exit code 255.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.

@rossturk
Copy link
Contributor

rossturk commented Jul 26, 2022

Error: Error building site: failed to render pages: render of "page" failed: "/Users/sathishkannan/code/airflow-site/landing-pages/site/layouts/blog/baseof.html:23:7": execute of template failed: template: blog/single.html:23:7: executing "blog/single.html" at <partial "head.html" .>: error calling partial: partial "head.html" not found

Hmmmm I may be wrong here but if I remember correctly this is one of the errors that occurs when the submodules aren't cloned.

Have you run git submodules update --init?

@Bowrna
Copy link
Author

Bowrna commented Jul 27, 2022

Have you run git submodules update --init?

Yes after running this the build-site command works. Thanks, @rossturk.
I have error occurring in check-site-links command and it fails with following error message.

sathishkannan@Sathishs-MacBook-Air airflow-site % python site_tool.py check-site-links
Checking if node module exists
Check if landing-pages/dist/index.html file exists
/opt/homebrew/bin/lynx
Working directory: /Users/sathishkannan/code/airflow-site/landing-pages
Dist directory: ./dist/

./check-links.sh: line 61: readarray: command not found
sathishkannan@Sathishs-MacBook-Air airflow-site % ./site.sh check-site-links
2022-07-27 11:32:33:INFO: Checking if node module exists
2022-07-27 11:32:33:INFO: Check if /Users/sathishkannan/code/airflow-site/landing-pages/dist/index.html file exists
2022-07-27 11:32:33:INFO: Changing to /Users/sathishkannan/code/airflow-site/landing-pages/ and running: ./check-links.sh@
/opt/homebrew/bin/lynx
Working directory: /Users/sathishkannan/code/airflow-site/landing-pages
Dist directory: ./dist/

./check-links.sh: line 61: readarray: command not found
sathishkannan@Sathishs-MacBook-Air airflow-site %

Am I missing downloading anything here? I checked in CONTRIBUTORS.md guide but didn't find anything relevant to this.
I am working on Mac M1

@rossturk
Copy link
Contributor

rossturk commented Jul 27, 2022

./check-links.sh: line 61: readarray: command not found

@Bowrna I think this has to do with the version of bash. If I try this on my M1 Mac, I get an error:

bash-3.2$ bash -version
bash -version
GNU bash, version 3.2.57(1)-release (arm64-apple-darwin21)
Copyright (C) 2007 Free Software Foundation, Inc.

bash-3.2$ readarray foo < <(ls -al)
readarray foo < <(ls -al)
ls -al
bash: readarray: command not found

But if I try it on my Linux machine, I get this:

rturk@caymus:~$ bash --version
GNU bash, version 5.0.17(1)-release (x86_64-pc-linux-gnu)
Copyright (C) 2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>

This is free software; you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

rturk@caymus:~$ readarray foo < <(ls -al)

rturk@caymus:~$ echo ${foo[8]}
drwxr-xr-x 2 rturk caymus 4096 Oct 6 2021 .dbt

I think perhaps CONTRIBUTING.md should instruct folks to install bash from Homebrew. I haven't tested it myself, but this looks promising:

% brew info bash
bash: stable 5.1.16 (bottled), HEAD
...

@potiuk
Copy link
Member

potiuk commented Jul 27, 2022

I think it has never been supposed to be run on Mac :).

And one of the reasons we want to rewrite everything in Python, is that bash is no longer "common" accross multiple platform. The MacOS Bash is Bash 3 (for licencing reason - Apple refused to use Bash4 due to GPLv3 (https://news.ycombinator.com/item?id=20102597) and a lot of tools there are not really POSIX compliant, so there are plenty of hacks around that (readarray is one of the features of Bash4).

The ultiimate goal we have is that Breeze and all our tools can be used WITHOUT having or using bash in the Host. Fully, Once we get there, the next step will be to make sure that it works on Windows.

And that should be our goal here as well (even though site.sh is not really used by many people). So - let's just not mention in CONTRIBUTING.md that you need to install bash from brew. I deliberately avoided that necessity, because that it yet another prerequisite that adds friction to the first-time experience.

Ideally just Python3 + Docker + Docker-Compose should be enough to do everything developer needs to do in Airlfow.

@potiuk
Copy link
Member

potiuk commented Jul 27, 2022

I think perhaps CONTRIBUTING.md should instruct folks to install bash from Homebrew. I haven't tested it myself, but this looks promising:

This has profound consequences. Many tools which "assume" bash is v3 on Mac will stop working if you make Bash 4 default. Even if you try to install it via brew it will tell you that it is cask-only, which means that it is not becoming a default bash - you need to perform an extra step to make it default precisely because it might break a lot of things on Mac.

Going bash-less is the only good route we can take - now when we have Python3.7+ everywhere (previously various options of having Python2 + Python 3 (3.5 as well) created similar set of problems, but since everyone is now on Python 3.7+ (3.6 EOL reached) basing "lowest common denominator" on Python 3.7 is great for such an environment (that's why we have the whole "Rewrite Breeze to Python" project now and not 2 years ago..

@rossturk
Copy link
Contributor

I think perhaps CONTRIBUTING.md should instruct folks to install bash from Homebrew. I haven't tested it myself, but this looks promising:

This has profound consequences. Many tools which "assume" bash is v3 on Mac will stop working if you make Bash 4 default. Even if you try to install it via brew it will tell you that it is cask-only, which means that it is not becoming a default bash - you need to perform an extra step to make it default precisely because it might break a lot of things on Mac.

I'm sorry, what are you suggesting as an alternative?

@potiuk
Copy link
Member

potiuk commented Jul 27, 2022

I'm sorry, what are you suggesting as an alternative?

Remove Bash. Completely.

I think this whole task is about removing bash and replacing it with Python :).
I think the problem @Bowrna has (correct me @Bowrna if I am wrong) that you cannot really run the OLD script corrrectly. This can be fixed locally by updating bash with brew and setting temporary PATH to get it first on path.

Once the script is replaced with Bash, there is no need to mention it in CONTRIBUTING - because ..... there will be no bash to run any more.

Or am I missing something :)?

@rossturk
Copy link
Contributor

I'm sorry, what are you suggesting as an alternative?

Remove Bash. Completely. I think this whole task is about removing bash and replacing it with Python :). I think the problem @Bowrna has (correct me @Bowrna if I am wrong) that you cannot really run the OLD script corrrectly. This can be fixed locally by updating bash with brew and setting temporary PATH to get it first on path. Once the script is replaced with Bash, there is no need to mention it in CONTRIBUTING - because ..... there will be no bash to run any more.

Or am I missing something :)?

Yes, you are missing that this PR title is "replace site.sh" and the script we are talking about is landing-pages/check-links.sh. Are you suggesting that the scope of this PR is expanded to include all shell scripts?

@potiuk
Copy link
Member

potiuk commented Jul 27, 2022

Yes, you are missing that this PR title is "replace site.sh" and the script we are talking about is landing-pages/check-links.sh. Are you suggesting that the scope of this PR is expanded to include all shell scripts?

Ahhhh. I see now 🤦 .

Precisely. This was the whole purpose of the task - to replace all the bash usage. It makes no sense only to replace site.sh. The reason why we are doing it is not because "we do not like bash", but because "we want to get rid of bash from the host entirely" - to be able to run it all on MacOS, Linux, Windows or anything else that has "native" Python (and potentially has no bash).

So the TITLE of the task is wrong. Should be "Remove Bash from the process of building the site".

@potiuk potiuk changed the title replacing site.sh with python Remove Bash from the process of building the site Jul 27, 2022
@potiuk
Copy link
Member

potiuk commented Jul 27, 2022

And the root cause of the confusion was that when the original issue was created, there was indeed only one site.sh :

See the description of the issue by @mik-laj - so that's why the title was wrong

#338

commented on 9 Dec 2020

In this project, we have one bash script - ./site.sh that starts to grow more and more, so its maintenance becomes more problematic. I think it's a good idea to rewrite it into Python.

@Bowrna
Copy link
Author

Bowrna commented Jul 27, 2022

Ah, I see the whole point now. Currently to see how this check-site-links command, I have to install bash via brew and see what is happening under the hood. I will try converting that part to python after understanding what is happening in that part of code.

@Bowrna
Copy link
Author

Bowrna commented Jul 28, 2022

Trying to understand the below bash command. Could anyone tell me what's happening in this command

mkdir -pv "${DIST_DIR}"
readarray -d '' pages < <(find "${DIST_DIR}" -name '*.html' -print0)
echo "Found ${#pages[@]} HTML files."

When trying to execute the above script I just enabled print and saw it's logging a few html files.
Screenshot 2022-07-28 at 11 48 17 PM

But when I took out that command and executed it via terminal it shows a big list of HTML files. I don't understand how this is different from executing in a script. When executing in terminal I see index.html from all providers and its different version say example /dist//docs/apache-airflow-providers-imap/1.0.0/_modules/index.html. But i don't see them when executing it via check-links.sh
Screenshot 2022-07-29 at 12 04 18 AM

@potiuk
Copy link
Member

potiuk commented Jul 28, 2022

It fills the array 'pages' with the list of HTML files found. The problem you see is likely the -print 0 flag- rather than producing files separated by "spaces" it separates them with \0 (null character) so that there is no confusion between space being separator between files and one that is part of the file name.

so if you have

  • a file.html
  • another file.html

This wil produce

"a file.html\0another file.html"

Then when you read it with -d '' you will get array:

['a file.html', 'another file.html']

Otherwise you'd get:

['a', 'file.html', 'another', 'file.html']

The problem with differences you see is that the "\0" separated files are not broken over multiple lines because they are really a very long line with no spaces to allow the break happen.

@Bowrna
Copy link
Author

Bowrna commented Jul 29, 2022

When executing in terminal I see index.html from all providers and its different version say example /dist//docs/apache-airflow-providers-imap/1.0.0/_modules/index.html. But i don't see them when executing it via check-links.sh

Could you share your view for this point? This is also another difference in the output when executing it in different ways @potiuk .

@potiuk
Copy link
Member

potiuk commented Jul 29, 2022

When executing in terminal I see index.html from all providers and its different version say example /dist//docs/apache-airflow-providers-imap/1.0.0/_modules/index.html. But i don't see them when executing it via check-links.sh

Could you share your view for this point? This is also another difference in the output when executing it in different ways @potiuk .

I think it's just the matter of when it is called and whether the files are there when it is executed. Some earlier steps likely prepare the files in dist folder and maybe that step is missing when you run it manually ? Can't think of any other reason.

@Bowrna
Copy link
Author

Bowrna commented Jul 29, 2022

I have tried executing both of ways consecutively without running any other step in between. Let me debug this again and come with more information @potiuk

@potiuk
Copy link
Member

potiuk commented Nov 8, 2022

Hey @Bowrna are you going to continue with that one :) ? Not the highest prirority, so if we do not plan it, maybe better to close that one?

@Bowrna
Copy link
Author

Bowrna commented Nov 10, 2022

I will work on this part this weekend @potiuk. As I got stuck on resolving the issue, I started fixing other issues and slowly I became dormant in working on this issue.

@mik-laj
Copy link
Member

mik-laj commented Nov 11, 2022

create_redirect "dist/docs/${package_name}/index.html" "/docs/${package_name}/stable/index.html"

Is this redirect still valid even after removing the docker part @mik-laj @potiuk @turbaszek @rossturk Could anyone help me to understand?

It is common to delete part of a URL to go to the parent location.
For example, when a user visits a URL:

https://airflow.apache.org/docs/apache-airflow-providers-alibaba/2.1.0/index.html

then the user can delete part of the URL and go to:

https://airflow.apache.org/docs/apache-airflow-providers-alibaba/

When we do not have this redirect, the user will see a 404 error.

In other words, all directories the user has access to should have an index.html file to improve UX.

@potiuk
Copy link
Member

potiuk commented Nov 25, 2022

Approved to see.

@eladkal
Copy link
Contributor

eladkal commented Dec 20, 2022

@Bowrna are you still working on this PR?

@Bowrna
Copy link
Author

Bowrna commented Dec 21, 2022

@eladkal I didn't work on this for a long time as I was working on other PR. I will start working on this PR and one another in this weekend as I have some time due to the upcoming holiday week.

@Bowrna
Copy link
Author

Bowrna commented Dec 25, 2022

As I am currently working on this PR, I will post the parts that I have completed and parts that are pending to keep on track.
Following are the commands supported in the site.sh bash script that's being moved to python:

  1. install-node-deps (successfully moved to python as of now)
  2. preview-landing-pages (successfully moved to python as of now)
  3. build-site (successfully moved to python as of now)
  4. build-landing-pages (successfully moved to python as of now)
  5. check-site-links (only pending part)
  6. prepare-theme (successfully moved to python as of now)
  7. lint-js (successfully moved to python as of now)
  8. lint-css (successfully moved to python as of now)

In the case of check-site-links, I am currently invoking the check-links.sh bash script via Python code. So I have to bring in the functionaly added via check-links.sh into Python part to close this part.

@Bowrna
Copy link
Author

Bowrna commented Dec 31, 2022

@potiuk @mik-laj I am considering to use beautiful soup library, to extract the links from html files for check-site-links command. Is it fine or is there any better way to handle it? If there is please let me know.

Edit: I will check if i can pull external links via regex pattern before trying to use bs4

Thanks. :)
I hope to finish this before this year end 🤞

@Bowrna
Copy link
Author

Bowrna commented Jan 7, 2023

@potiuk @mik-laj
https://github.com/apache/airflow-site/blob/main/landing-pages/check-links.sh#L65

mapfile -t links < <(printf '%s\n' "${pages[@]}" | xargs -n 1 lynx -listonly -nonumbers -dump | grep -v " ")
Here to check the site links, we rely on the lynx command to get the clean url from the html file. Could i use the same command in python or is there any other way that helps to get the url in clean way?

@mik-laj
Copy link
Member

mik-laj commented Jan 7, 2023

We don't use this check, so I think we can drop it for now.

@Bowrna
Copy link
Author

Bowrna commented Jan 7, 2023

@mik-laj Could you tell me if the check-site-links command in site.sh is used or not? Because that command uses this script file and here they have used lynx to extract URL?

Could you give me more explanation on that part?

@Bowrna
Copy link
Author

Bowrna commented Jan 7, 2023

Also, can this code be reviewed? I will try fixing this PR and getting it merged to main.

@Bowrna
Copy link
Author

Bowrna commented Jan 14, 2023

Hello all :) do I have any update on this PR?

@potiuk
Copy link
Member

potiuk commented Jan 16, 2023

Hmm, I honestly think we should not invest too much in it. We are discussing about switching to Pelikan which is going to change the process completely anyway.

@potiuk
Copy link
Member

potiuk commented Jan 16, 2023

There are a number of problems with dependencies the current approach has (and it caused us some problems in the recent past) so I think this is somethign we have to do anyway soon.

@Bowrna
Copy link
Author

Bowrna commented Jan 17, 2023

@potiuk Thanks for the update. I am happy to help in switching to Pelikan part :)

@Bowrna
Copy link
Author

Bowrna commented Apr 19, 2023

could i close this PR @potiuk as this is not relevant if switching to Pelican?

@potiuk potiuk closed this Apr 19, 2023
@potiuk
Copy link
Member

potiuk commented Apr 19, 2023

yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rewrite ./site.sh to Python
6 participants