Project selection and Prioritization Tools

Process overview

Collect all project information necessary to make a selection as shown exemplary in the excel file. It contains information about the name, start date, end data, involved partners, and focus of the projects. In order to retrieve additional public information on the projects, one possibility is to use web scraping techniques. Here, we give an example using a Python script. Note that other techniques can be used to collect this information, such as a KNIME workflow - see "FAIRplus: D1.2 Selection criteria and guidelines for data sources from IMI projects and EFPIA internal databases" for more information on the KNIME workflow and the selection procedure.
Classify the project data based on the content information. This Python script searches for a predefined list of disease keywords (present in the ontology file) in the project content text, which subsequently allows grouping projects based on the disease keywords. This automated approach is helpful when dealing with large number of projects, but may not be needed for evaluating a few projects. The list of disease keywords can be adapted to fulfill specific needs.
Collect specific technical information using a data survey. This is a document containing an example of a data survey that was used to understand the data structure in a given project.
Use all project information and apply a score card. In this document, projects are scored based on scientific, societal, and technical aspects. The total score is used to prioritize the projects.

Loading the script

Getting the code locally

Clone the GitHub repository locally using the following command in your terminal:

git clone https://github.com/Fraunhofer-ITMP/IMI-Project-Prioritization

Alternatively, you can also use GitHub Desktop to clone the repository.

Creating a virtual environment

Go to the location of your cloned local repository via the terminal and get into the IMI-Project-Prioritization folder using the following command:

cd IMI-Project-Prioritization

Create a virtual environment. If you are using conda, you can use the following commands:

conda create --name virtualenv python=3.9

Activate your virtual environment:

conda activate virtualenv

Install all the requirements into your virtual environment:

pip install -r requirements.txt

Install the model used for the analsyis into your virtual environment:

pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.4.0/en_core_sci_md-0.4.0.tar.gz

Pre-requisite installations

Data files:

Project information file - File containing basic information of the project that can later be used for scraping from public domains.
Vocabulary file - TSV file with mapping of terms with their classes.

Prior to running the scripts, please ensure that you manually download and install the chromedriver based on your Google Chrome version and change the path to the executable file in constants.py.

For Linux and Mac users, an additional step to add chromedriver to the path has to be done using the following command in your terminal:

mv chromedriver /usr/local/bin
sudo xattr -d com.apple.quarantine /usr/local/bin/chromedriver (For Mac M1 users)

For Mac user, be sure to authorize the execution of the binary in the System -> Preferences -> Security&Privacy -> General Tab

Note that you need to have admin rights on your computer or laptop for doing this.

Running the script

Run the main.py file from the Python terminal as follows:

python src/main.py

Output

Upon running the code, you will get two files as outputs:

imi2_project_list.tsv - This is a TSV file with all the information scrapped from the IMI website relevant to the project of interest.
imi2_project_group.tsv - This is a TSV file with the keyword-enriched information for the given project data found in the previous file.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
data		data
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements-notes.txt		requirements-notes.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

src

src

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

requirements-notes.txt

requirements-notes.txt

requirements.txt

requirements.txt

Repository files navigation

Project selection and Prioritization Tools

Process overview

Loading the script

Getting the code locally

Creating a virtual environment

Pre-requisite installations

Running the script

Output

About

Releases 1

Packages

Contributors 3

Languages

License

Fraunhofer-ITMP/IMI-Project-Prioritization

Folders and files

Latest commit

History

Repository files navigation

Project selection and Prioritization Tools

Process overview

Loading the script

Getting the code locally

Creating a virtual environment

Pre-requisite installations

Running the script

Output

About

Topics

Resources

License

Stars

Watchers

Forks

Languages