This project is a Streamlit web application for exploring the CORD-19 metadata dataset. It allows users to search research papers, filter by publication year, view publication trends, explore top journals, and generate word clouds of paper titles.
- Dataset Preview: Displays a sample of the metadata for quick inspection.
- Search Tool: Enter a keyword to search through titles and abstracts.
- Year Filter: Filter publications by year and see how many papers were published.
- Publications Over Time: Line chart showing the trend of publications by year.
- Top Journals: Bar chart of the most common journals in the dataset.
- Word Cloud: Visualization of the most frequent words in paper titles.
-
metadata_sample.csv
→ a smaller, shareable sample (used for GitHub/Streamlit Cloud). -
metadata.csv
→ the full dataset (~1.5GB).⚠️ This file is very large and should not be pushed to GitHub.- The app will fall back to the sample file if the full dataset is unavailable.
-
Clone the repository
git clone https://github.com/PreciousAnagwu/Week-8-Python-Data-Frame-Assignment.git
-
Create a virtual environment (recommended)
python -m venv venv source venv/bin/activate # Mac/Linux venv\Scripts\activate # Windows
-
Install dependencies
pip install -r requirements.txt
-
Run the Streamlit app
streamlit run streamlit_app.py
Main Python libraries used:
pandas
streamlit
matplotlib
seaborn
wordcloud
- (See
requirements.txt
for the full list.)
You can view the live app here: 👉 Live Demo
- CORD-19 Dataset provided by Allen Institute for AI.
- Built as part of the PLP Academy Week 8 Python Assignment.