Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow /src scripts to receive data files as command line arguments #167

Closed
Irio opened this issue Dec 14, 2016 · 12 comments
Closed

Allow /src scripts to receive data files as command line arguments #167

Irio opened this issue Dec 14, 2016 · 12 comments

Comments

@Irio
Copy link
Collaborator

Irio commented Dec 14, 2016

The majority of scripts located in process datasets in hard coded locations, like data/cnpj-info.xz in src/fetch_cnpj_info.py#L10. Given a lot has changed since their creation, we expect them to receive data paths as command line arguments, as in python src/fetch_cnpj_info.py data/2016-12-06-cnpj-info.xz.

@marcusrehm
Copy link
Contributor

Hi @Irio ! I'm working exactly that. I spoke with @cuducos about issue #67 that I also need to use fetch_cnpj_info with the amendments' dataset .

Actually, I was thinking about passing the column with the CNPJ's as a parameter also, so the call would be src/fetch_cnpj_info.py './data/2016-12-06-cnpj-info.xz' 'cnpj'. What do you think?

@marcusrehm
Copy link
Contributor

Hi @Irio

I did the refactoring in fetch_cnpj_info.py now we can call it passing filename and column with CNPJ's.

@cuducos
Copy link
Collaborator

cuducos commented Dec 15, 2016

Sorry if I missed anything… but I don't think passing the column we need is what I had in mind. When I read OP's description I think he meant the file to be load (for example 2016-08-08-companies.xz or 2016-12-06-companies.xz). Also I'm afraid that's not what I meant in #67, @marcusrehm (but I'm gonna clarify that on the issue page).

@marcusrehm
Copy link
Contributor

@cuducos
Copy link
Collaborator

cuducos commented Feb 3, 2017

@marcusrehm PR #185 addresses this issue when it come to company scripts, but not for all scripts inside src/ directory ; )

@marcusrehm
Copy link
Contributor

yes, @cuducos , you're right! =)

@martini97
Copy link

martini97 commented Apr 28, 2017

I would like to know specifically what is the idea in this issue. I've looked through some files in source, and some of then just download data, would you like the script to specify the path where the data is to be saved? Or would you like only for the scripts that read files to receive arguments? Anyway I think it wouuld be nice to post a roadmap with the files that you'd like to change. Thanks.

@cuducos
Copy link
Collaborator

cuducos commented Apr 29, 2017

Hi @martini97,

would you like the script to specify the path where the data is to be saved?

No, actually what would be interesting is to specify via command line the files they read (as I commented earlier).

Anyway I think it wouuld be nice to post a roadmap with the files that you'd like to change.

Well… files from src/ that read other file:

Or another way to put every script but:

  • backup_data.py
  • utils.py
  • And fetch_sex_places.py (already written this way)

@marcusrehm
Copy link
Contributor

Hi @martini97 ! I think what @cuducos suggested was commented here. He's asking to create a sort of mapping like it:

# not functional, just a example
cols = {'amendments': 'beneficiary', 'other_dataset': 'something_else}
cnpj_col = cols.get(base_file_name, 'cnpj')

So when the script receive a file as argument it can grab data (CNPJs) from the referenced column and to the job.

It is already done in fetch_cnpj_info.py for the following datasets:

datasets_cols = {'reimbursements': 'cnpj_cpf',
                 'current-year': 'cnpj_cpf',
                 'last-year': 'cnpj_cpf',
                 'previous-years': 'cnpj_cpf',
                 'amendments': 'amendment_beneficiary'}

Is that right @cuducos ?

@cuducos
Copy link
Collaborator

cuducos commented Apr 29, 2017

👍

Irio pushed a commit that referenced this issue Feb 27, 2018
…s-friendly-name

Add human friendly name for irregular companies classifier
@willianpaixao
Copy link
Contributor

@Irio @cuducos is this still an issue?

@cuducos
Copy link
Collaborator

cuducos commented Oct 4, 2018

Closed because this src/ folder is not in use anymore.

@cuducos cuducos closed this as completed Oct 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants