This project provides a reproducible workflow for analyzing the CORD-19 metadata.csv
file.
It includes:
- A Jupyter Notebook for interactive exploration
- A Python script for automated analysis
- A Streamlit dashboard for interactive visualization
β οΈ Note: The full CORD-19 dataset is very large. For this project, only downloadmetadata.csv
from Kaggle:
CORD-19 Research Challenge
Place it inside thedata/
folder asdata/metadata.csv
.
requirements.txt
β Python dependenciesnotebooks/analysis.ipynb
β Jupyter Notebook (exploration & cleaning)analysis.py
β Script version of the analysisstreamlit_app/app.py
β Streamlit web applicationdata/metadata.csv
β Dataset (not included in repo, must be downloaded separately)
-
Install dependencies
pip install -r requirements.txt
-
Download the dataset
Download the dataset and place metadata.csv
into the data/
folder.
- Run the analysis
jupyter notebook notebooks/analysis.ipynb
# or
python analysis.py
- Launch the Streamlit app
streamlit run streamlit_app/app.py
Both the notebook and script include error handling for missing files/columns.
The Streamlit app supports filtering by publication year and provides:
π Number of publications over time
π Top journals chart
π Word frequency insights from titles
π A preview of the dataset