-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTML of Pipeline example in tutorial incomplete #21457
Comments
Thanks for opening your first issue here! Be sure to follow the issue template! |
Also: |
You need to install the postgres provider -- that is also missing from that tutorial https://airflow.apache.org/docs/apache-airflow-providers-postgres/2.4.0/#installation |
Thanks! But I actually already have the provider installed, since I just thought that The closest module is: |
Oh sorry, |
One last thing:
For me (someone who didn't work with Postgres) it wasn't trivial to get the example to run, because I didn't know how to explicitly create the database. Finally, I did it with the following code: def get_data():
# NOTE: configure this as appropriate for your airflow environment
data_path = "/opt/airflow/dags/files/employees.csv"
url = "https://raw.githubusercontent.com/apache/airflow/main/docs/apache-airflow/pipeline_example.csv"
response = requests.request("GET", url)
with open(data_path, "w+") as file:
file.write(response.text)
postgres_hook = PostgresHook(postgres_conn_id="postgres_default")
conn = postgres_hook.get_conn()
cur = conn.cursor()
cur.execute('''CREATE TABLE "Employees"
(
"Serial Number" NUMERIC PRIMARY KEY,
"Company Name" TEXT,
"Employee Markme" TEXT,
"Description" TEXT,
"Leave" INTEGER
);
CREATE TABLE "Employees_temp"
(
"Serial Number" NUMERIC PRIMARY KEY,
"Company Name" TEXT,
"Employee Markme" TEXT,
"Description" TEXT,
"Leave" INTEGER
); ''')
with open(data_path, "r") as file:
cur.copy_expert(
"COPY \"Employees_temp\" FROM STDIN WITH CSV HEADER DELIMITER AS ',' QUOTE '\"'",
file,
)
conn.commit() I don't know if this is the intended/optimal way to create the DB, but the DAG's run didn't fail. Maybe it would be easier for bloody beginners like me to include the part of creating the DB explicitly in the tutorial's code. |
Thanks for posting those. But I will close the ticket because the main reason why you see the difference is because you are looking at "stable" version of the docs released together with Airflow when the stable version of Airflow is released (which is 2.2.3 currently) So what is published in 2.2.3 version of the docs (which you can see at the top-left selection of the page) is this: https://github.com/apache/airflow/blob/2.2.3/docs/apache-airflow/tutorial.rst The I actually looked through all documentation-only changes we had in
This is indeed wrong. The easiest way for you to fix it is just go to the tutorial page and click "Suggest a change on this page" (bottomn-right of the page). It will open you the Pull Request with the "main" version of the page and you will be able to easily create a PR (even more easily than creating this issue was. You can this way become one of the > 1900 contributors to Airlfow (because Airflow is created by the contributors like you. This would be a fantastic first contribution to correct it!
You should follow Quick start if you are bloody beginner. This is who it is intended for. And It explains what happens and what is going on includng the steps to initialize DB ( |
http://apache-airflow-docs.s3-website.eu-central-1.amazonaws.com/docs/apache-airflow/latest/tutorial.html#pipeline-example for the rendering of the latest doc. The import is still wrong in that example, so if you fancy a PR @KevinYanesG (no need for an issue, can just go to PR fixing it) we can guide you through it. |
Absolutely. |
Thanks for the extensive responses! I wasn't aware of that. In any case, it's good that you are going to include these details into the next version.
I will be happy to do that :) I hope I can manage to do it before the end of the week.
I actually followed the Docker Quickstart and not the standalone one. Thus, I didn't saw the |
Aaah My Bad. Indeed - silly me. Yeah by all means - adding those would be a nice thing for the tutorial. Thanks for looking at those and attempts to clarify it all ! BTW. If you add the PRs shortly I am super-happy to cherry-pick those PRs to 2.2.4 as well :). |
Actually there IS a better way! We have the command:
That drops you into the DB command line environment that is configured. So you could - I think - even redirect a creation script to the command - that would be much nicer, because it could be done as a single command:
|
And in docker-compoe Quickstart you could likely combine it with https://airflow.apache.org/docs/apache-airflow/stable/start/docker.html#running-the-cli-commands |
Thanks to you! I tried
But unfortunately, after trying many different things, I didn't bring it to work, probably because of little experience in Airflow / Postgres. A very brief summary of what I tried after deleting the
(I wonder why this error didn't appear when using the
(I also tried to run the sql commands directly in the shell but also didn't work) |
Isn't that because you already created it before :)? |
just run (in reverse order from creation) at the beginning of the script
|
I actually already tried with just ./airflow.sh airflow db shell <<EOF
DROP TABLE EMPLOYEES_TEMP IF EXISTS CASCADE;
DROP TABLE EMPLOYEES IF EXISTS CASCADE;
CREATE TABLE "Employees"
(
"Serial Number" NUMERIC PRIMARY KEY,
"Company Name" TEXT,
"Employee Markme" TEXT,
"Description" TEXT,
"Leave" INTEGER
);
CREATE TABLE "Employees_temp"
(
"Serial Number" NUMERIC PRIMARY KEY,
"Company Name" TEXT,
"Employee Markme" TEXT,
"Description" TEXT,
"Leave" INTEGER
);
EOF After running the DAG I still get psycopg2.errors.UniqueViolation: duplicate key value violates unique constraint "Employees_temp_pkey"
DETAIL: Key ("Serial Number")=(2) already exists.
CONTEXT: COPY Employees_temp, line 2 |
This error indicates that the input file contain a duplicate (or maybe Postgres has case-sensitive names and the DROP did not actually drop the table)? |
Here is the right Blog about it : https://blog.xojo.com/2016/09/28/about-postgresql-case-sensitivity/ Seems that when you initially create Postgres tables with |
(BTW. Of course I had no idea about it before I googled it). |
Thanks, that's a good hint I will probably get to it in the beginning/middle of next week. |
Ok, I got it to work, thanks again for the suggestions, for some reason the |
Describe the issue with documentation
Apache Version: 2.2.3
OS:macOS Catalina Version 10.15.7
Apache Airflow Provider versions:
Deployment:
Web browser: Google Chrome Version 98.0.4758.80 (Official Build) (x86_64)
Also happened with Safari Version 13.1.3 (15609.4.1)
What happened:
The Pipeline example of the tutorial (in the website) looked like the following:
I was wondering why there was so little information about the connection to Postgres and why there were no
import
statements in the code.How to solve the problem
Then, I realized that in the GitHub repo the tutorial file was way more extensive:
I marked in red the parts of the tutorial that can't be seen on the website. There are probably more missing details.
Thanks in advance!
Anything else
No response
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: