COVID-19 Research Dataset Analysis

Overview

This project provides an interactive data exploration and visualization platform for analyzing the CORD-19 research dataset. It consists of a Jupyter notebook for exploratory data analysis and a Streamlit web application for interactive visualizations and insights.

Purpose

The COVID-19 Research Dataset Analysis project aims to:

Facilitate exploration of COVID-19 research publications metadata
Identify publication trends and patterns over time
Analyze top contributing journals and institutions
Provide accessible visualizations for research insights
Enable data-driven understanding of the scientific response to COVID-19

Architecture

Jupyter Notebook (COVID.ipynb)

Purpose: Exploratory data analysis and initial insights
Data Flow: Load → Clean → Transform → Analyze → Visualize
Output: Static charts and analysis summaries
Audience: Data scientists, researchers, analysts

Streamlit Application (streamlit_app.py)

Purpose: Interactive web-based dashboard
Architecture: Single-page app with tabbed interface
Data Caching: Optimized performance with Streamlit's caching
Deployment: Locally runnable web application

Features

Notebook Features

Data Loading & Cleaning: Handles CORD-19 metadata with proper missing value treatment
Time-based Analysis: Publication trends over time (2019-2024)
Journey Analysis: Top publishing journals and word frequency patterns
Visualizations: Line plots, bar charts, word clouds, frequency plots
Statistical Summaries: Descriptive statistics for numerical features

Application Features

Dashboard Layout: Multi-tab interface (Overview, Publications, Journals, Word Analysis)
Interactive Controls: Sliders for customizable data display
Metric Cards: Key statistics display (total papers, time range, word counts)
Dynamic Charts: Matplotlib and Streamlit built-in charts
Data Table Views: Scrollable tables for detailed examination

Usage

Prerequisites

Python 3.7+
Internet connection for dependency installation

Installation

Create Virtual Environment:

python -m venv venv
venv\Scripts\activate  # Windows

Install Dependencies:
```
pip install -r requirements.txt
```

Data Preparation

Place metadata.csv file in the project root directory
Ensure CSV contains CORD-19 dataset columns (cord_uid, title, abstract, publish_time, authors, journal)

Running the Notebook

jupyter notebook COVID.ipynb

Process Flow:

Import libraries
Load and examine data structure
Perform data cleaning
Transform dates and add derived columns
Generate analyses (publications by year, top journals, word frequencies)
Create visualizations

Running the Streamlit App

streamlit run streamlit_app.py

Usage Guide:

Overview Tab: View dataset statistics and sample publications
Publications Tab: Analyze temporal trends and publication counts
Journals Tab: Explore top publishing journals with customizable counts
Word Analysis Tab: Examine title word frequencies and cloud visualizations

Interactive Elements

Adjust slider controls to view different numbers of results
Use multiselect filters for year-based analysis
Resize browser window for responsive layout

Data Insights

The applications reveal key patterns in COVID-19 research:

Accelerated publication growth from 2020 onwards
Medical and scientific journals as primary publishers
Thematic focus on COVID, SARS, viral topics in titles
Increasing publication volume reflecting pandemic urgency

Technical Notes

Dependencies: pandas, matplotlib, seaborn, streamlit, wordcloud
Data Size: Optimized for datasets up to 1M+ rows
Performance: Streamlit caching prevents redundant data processing
Compatibility: Tested on Windows 11 with Python 3.9-3.13

For detailed code documentation, see inline comments in respective files.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
COVID.ipynb		COVID.ipynb
Documentation_and_Reflection.md		Documentation_and_Reflection.md
README.md		README.md
requirements.txt		requirements.txt
streamlit_app.py		streamlit_app.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

COVID-19 Research Dataset Analysis

Overview

Purpose

Architecture

Jupyter Notebook (COVID.ipynb)

Streamlit Application (streamlit_app.py)

Features

Notebook Features

Application Features

Usage

Prerequisites

Installation

Data Preparation

Running the Notebook

Running the Streamlit App

Interactive Elements

Data Insights

Technical Notes

About

Uh oh!

Releases

Packages

Languages

PLP-Academy/Python-Week-8-Assignment-Final-Project

Folders and files

Latest commit

History

Repository files navigation

COVID-19 Research Dataset Analysis

Overview

Purpose

Architecture

Jupyter Notebook (COVID.ipynb)

Streamlit Application (streamlit_app.py)

Features

Notebook Features

Application Features

Usage

Prerequisites

Installation

Data Preparation

Running the Notebook

Running the Streamlit App

Interactive Elements

Data Insights

Technical Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages