Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make GoogleCloudServiceAccountDictProfileMapping dataset profile argument optional #837

Closed
oliverrmaa opened this issue Feb 6, 2024 · 4 comments · Fixed by #1017
Closed
Labels
area:profile Related to ProfileConfig, like Athena, BigQuery, Clickhouse, Spark, Trino, etc dbt:run Primarily related to dbt run command or functionality profile:bigquery Related to BigQuery ProfileConfig

Comments

@oliverrmaa
Copy link
Contributor

Context

Given the issue here and the discussion around it in slack, it follows that it would not be too far off for other profile mapping classes (like GCP) to have their schema (dataset in GCP terms) arguments made optional.

The configurations in dbt (profiles.yml, dbt_project.yml, model level configs etc.) are meant to provide the dataset. As such, me and my team's initial thoughts were to not provide any arguments here since usually we would rely on this being handled inside of dbt instead of it being hard coded at the DAG (orchestration) level.

Solution

I believe its possible modifying this line here and the associated code similarly to this issue here would be the right path.

Me and/or my colleague are happy to to attempt this modification if there are no objections to this proposed change. Please let us know, thanks

Copy link

dosubot bot commented Feb 6, 2024

🤖

That sounds like a great idea! Making the dataset profile argument optional could provide more flexibility for users. Your proposed solution seems like a step in the right direction. We encourage you and your colleague to go ahead and attempt the modification. Thank you for your willingness to contribute to the project!


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@dosubot dosubot bot added area:profile Related to ProfileConfig, like Athena, BigQuery, Clickhouse, Spark, Trino, etc dbt:run Primarily related to dbt run command or functionality profile:bigquery Related to BigQuery ProfileConfig labels Feb 6, 2024
@oliverrmaa
Copy link
Contributor Author

@dosu-bot I have opened a PR #839 to address this issue and it has passed all tests and is ready for review, how do I request a review?

Copy link

dosubot bot commented Feb 7, 2024

🤖

To request a review for your PR #839, you can follow these steps:

  1. Go to your pull request page on GitHub.
  2. On the right side of the page, you'll see a section titled "Reviewers".
  3. Click on the gear icon next to "Reviewers".
  4. From the dropdown, select the person or team you want to review your changes.

If you don't see the "Reviewers" section, it might be because you don't have the necessary permissions. In that case, you can leave a comment on the PR asking for a review from the relevant person or team.

Regarding the proposed change to make the dataset profile argument optional in GoogleCloudServiceAccountDictProfileMapping, it seems like a reasonable suggestion. The dataset is indeed usually handled inside of dbt and making it optional would provide more flexibility. However, please note that any changes to the codebase should be thoroughly tested to ensure they don't introduce any regressions or unexpected behavior.

As for the modification you mentioned, it would involve changing the required_fields list and the airflow_param_mapping dictionary in the GoogleCloudServiceAccountDictProfileMapping class. You would need to remove dataset from required_fields and handle the absence of dataset in airflow_param_mapping. This would likely involve checking if extra.dataset exists before trying to use it.

Please ensure that your changes are in line with the Cosmos Contributing Guide and that all tests pass before requesting a review. If you have any further questions or need more guidance, feel free to ask.

For more information, you can refer to the Cosmos codebase, specifically the GoogleCloudServiceAccountDictProfileMapping class, and the Cosmos Contributing Guide.

I hope this helps! If you have any other questions, don't hesitate to ask.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@dosubot dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label May 8, 2024
@dosubot dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale May 15, 2024
@dosubot dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label May 15, 2024
tatiana pushed a commit that referenced this issue Jun 5, 2024
#1017)

This PR follows the methodology in
#683 by modifying
the GCP `GoogleCloudServiceAccountDictProfileMapping()` `profile_args`
not require a dataset as a required argument.

DAG RUN
<img width="1473" alt="Screenshot 2024-06-05 at 6 50 04 PM"
src="https://github.com/astronomer/astronomer-cosmos/assets/98807258/81b127b9-c5e5-4983-9efe-bbf00d81914f">

Co-authored-by: "Ollie Ma" <oliver.zheyi.ma@gmail.com> 
Original PR by @oliverrmaa: #839

Closes #837
tatiana pushed a commit that referenced this issue Jun 6, 2024
#1017)

This PR follows the methodology in
#683 by modifying
the GCP `GoogleCloudServiceAccountDictProfileMapping()` `profile_args`
not require a dataset as a required argument.

DAG RUN
<img width="1473" alt="Screenshot 2024-06-05 at 6 50 04 PM"
src="https://github.com/astronomer/astronomer-cosmos/assets/98807258/81b127b9-c5e5-4983-9efe-bbf00d81914f">

Co-authored-by: "Ollie Ma" <oliver.zheyi.ma@gmail.com> 
Original PR by @oliverrmaa: #839

Closes #837
@tatiana
Copy link
Collaborator

tatiana commented Jun 7, 2024

It seems the schema property is mandatory in profiles.yml when using BQ:

Having a dbt project with profiles.yml:

airflow_db:
  target: dev
  outputs:
    bq:
      type: bigquery
      method: service-account
      project: <project>
      #schema: <dataset>
      threads: 2
      keyfile:  /usr/local/airflow/dbt/jaffle_shop/key.json

And the following

models:
  jaffle_shop:
      +schema: dataset

Fails in both dbt 1.5 and 1.8:

Running dbt-core==1.5.0:

dbt ls --target bq --profile airflow_db
10:13:24  Running with dbt=1.5.0
10:13:32  Encountered an error:
Runtime Error
  Credentials in profile "airflow_db", target "bq" invalid: 'schema' is a required property

Running dbt-core==1.8.2:

dbt ls --target bq --profile airflow_db     
10:15:31  Running with dbt=1.8.2
10:15:40  Encountered an error:
Runtime Error
  Credentials in profile "airflow_db", target "bq" invalid: Runtime Error
    Must specify schema

I also attempted to do this while

arojasb3 pushed a commit to arojasb3/astronomer-cosmos that referenced this issue Jul 14, 2024
astronomer#1017)

This PR follows the methodology in
astronomer#683 by modifying
the GCP `GoogleCloudServiceAccountDictProfileMapping()` `profile_args`
not require a dataset as a required argument.

DAG RUN
<img width="1473" alt="Screenshot 2024-06-05 at 6 50 04 PM"
src="https://github.com/astronomer/astronomer-cosmos/assets/98807258/81b127b9-c5e5-4983-9efe-bbf00d81914f">

Co-authored-by: "Ollie Ma" <oliver.zheyi.ma@gmail.com> 
Original PR by @oliverrmaa: astronomer#839

Closes astronomer#837
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:profile Related to ProfileConfig, like Athena, BigQuery, Clickhouse, Spark, Trino, etc dbt:run Primarily related to dbt run command or functionality profile:bigquery Related to BigQuery ProfileConfig
Projects
None yet
2 participants