Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a new version of imdb.csv and update the examples to use it #721

Closed
1 task
tatiana opened this issue Aug 23, 2022 · 0 comments · Fixed by #728
Closed
1 task

Create a new version of imdb.csv and update the examples to use it #721

tatiana opened this issue Aug 23, 2022 · 0 comments · Fixed by #728
Assignees
Milestone

Comments

@tatiana
Copy link
Collaborator

tatiana commented Aug 23, 2022

Context

Our smallest example of the Astro SDK uses the imdb.csv dataset, which currently has capitalized columns. It works fine for the illustrative example (Sqlite), but when using Postgres, the same transform select statement fails.

Example DAG

@aql.transform()
def get_top_ten_rated_films(input_table: Table):
    return """
            SELECT Title, Rating, Genre
            FROM {{ input_table }}
            ORDER BY Rating DESC
            LIMIT 10;
        """

with DAG(...):
    load_imdb_movies = aql.load_file(
        task_id='load_imdb_movies',
        input_file=File(
            path='https://raw.githubusercontent.com/astronomer/astro-sdk/main/tests/data/imdb.csv'
        ),
        output_table=Table(
            name='imdb_movies',
            conn_id=CONN_ID,
        ),
    )

    top_ten_rated_films = get_top_ten_rated_films(
        input_table=load_imdb_movies,
        output_table=Table(
            name='top_rated',
            conn_id=CONN_ID,
        ),
    )

The issue raised:

[2022-08-23, 10:17:35 UTC] {taskinstance.py:1910} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1705, in _execute_context
    self.dialect.do_execute(
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/default.py", line 716, in do_execute
    cursor.execute(statement, parameters)
psycopg2.errors.UndefinedColumn: column "title" does not exist
LINE 2:             SELECT Title, Rating, Genre
                           ^
HINT:  Perhaps you meant to reference the column "imdb_movies.Title".

The workaround is to change the select statement:

            SELECT "Title", "Rating", "Genre"

Acceptance criteria

  • Create a new version of imdb.csv with lowercase column names and update the examples to use it, so this change is backwards-compatible
@tatiana tatiana changed the title Create a new version of imdb.csv with lowercase column names and update the examples to use it Create a new version of imdb.csv and update the examples to use it Aug 23, 2022
@tatiana tatiana self-assigned this Aug 23, 2022
@kaxil kaxil added this to the 1.0.1 milestone Aug 23, 2022
kaxil pushed a commit that referenced this issue Aug 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants