Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is the api source code available? #2

Open
caseyfitz opened this issue Jan 5, 2021 · 6 comments
Open

Is the api source code available? #2

caseyfitz opened this issue Jan 5, 2021 · 6 comments

Comments

@caseyfitz
Copy link

caseyfitz commented Jan 5, 2021

I'm interested in the source for the tool itself, e.g., the the Dockerfile and the scripts run by the container.

@caseyfitz
Copy link
Author

More specifically, inside the container we find many files that aren't in this repository

└── ubuntu
    ├── application.py
    ├── config.py
    ├── data
    │   ├── CombinedDictionaryMap.json
    │   ├── CombinedNGRAMMatrixCSR.pkl
    │   ├── FOSIndex.json
    │   ├── FOSMAP.json
    │   ├── OSDG-Ontology.json
    │   ├── SdgThresholds.json
    │   ├── Spacy_bigram_th1.md
    │   ├── spacy_idf_th1.json
    │   └── spacy_trigram_th1.md
    ├── Dockerfile
    ├── exceptions.py
    ├── get_data.py
    ├── index_html
    ├── LICENSE
    ├── __pycache__
    │   ├── config.cpython-37.pyc
    │   ├── exceptions.cpython-37.pyc
    │   ├── sdgFinder.cpython-37.pyc
    │   └── utils.cpython-37.pyc
    ├── README.md
    ├── requirements.txt
    ├── sampleAPICall.py
    ├── sdgFinder.py
    ├── setup.sh
    └── utils.py

Are these maintained in a public repository?

@lukas-pkl
Copy link
Contributor

@caseyfitz
Thanks for your question!
The answer is - not yet, but we will put these in the public repo by the end of the month. So it should be online from 1st February 2021. However, we will move the repository to a new address (https://github.com/osdg-ai/osdg-tool) and the full source code will be posted there.
We are currently cleaning and refactoring the code so it would be more readable and user-friendly

@caseyfitz
Copy link
Author

@lukas-pkl, looking forward to it––thanks!

@caseyfitz
Copy link
Author

@lukas-pkl a quick related question (then I'll make sure to close).

I'm wondering how to interpret the "quota_9" field in the file SdgThreasholds.json, of form

{
    "SDG_1":
         {"LowerTh": 2, "UpperTh": 4, "quota_9": 6},
   "SDG_2": 
       {"LowerTh": 2, "UpperTh": 6, "quota_9": 20},
   "SDG_3":
    ....

which is used in sdgFinder.py to divide the relevance scores for each sdg

            sdg_res_raw_fosNames[key] = plh3

        # Applying .9 quota
        self.sdg_res = sorted(sdg_res_raw_n.items(), key=lambda kv: kv[1] / self.sdgThresholds[kv[0]]['quota_9'], reverse=True)

        self.sdg_res_det = {}

I couldn't find this term referenced in the main repo or the arxiv paper.

Thanks!

@lukas-pkl
Copy link
Contributor

lukas-pkl commented Jan 7, 2021

@caseyfitz - we are addressing issues like this in our current refactoring.

Basically, quota_9 is a parameter we use to sort the SDGs before producing the output.
One of the issues we faced with was that the API sometimes produces too many SDG labels even with thresholds applied.
As such, we have decided to limit the API output to three SDG labels. We select top three labels using quota_9 parameter, which we set by assigning SDG tags to a pool of publications and analyzing the distribution of SDG-FOS'es.
The parameter corresponds to 90% percentile of the distribution for each SDG, which means that we rank publication SDGs by the how close they come to this mark.

We are preparing an update to the arxiv paper, which we will present in a conference in July. We will update the arxiv version after the event.

Let me know if anything else comes up!

@caseyfitz
Copy link
Author

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants