Skip to content

PLP-Academy/Python-Week-8-Assignment-Final-Project

Repository files navigation

COVID-19 Research Dataset Analysis

Overview

This project provides an interactive data exploration and visualization platform for analyzing the CORD-19 research dataset. It consists of a Jupyter notebook for exploratory data analysis and a Streamlit web application for interactive visualizations and insights.

Purpose

The COVID-19 Research Dataset Analysis project aims to:

  • Facilitate exploration of COVID-19 research publications metadata
  • Identify publication trends and patterns over time
  • Analyze top contributing journals and institutions
  • Provide accessible visualizations for research insights
  • Enable data-driven understanding of the scientific response to COVID-19

Architecture

Jupyter Notebook (COVID.ipynb)

  • Purpose: Exploratory data analysis and initial insights
  • Data Flow: Load → Clean → Transform → Analyze → Visualize
  • Output: Static charts and analysis summaries
  • Audience: Data scientists, researchers, analysts

Streamlit Application (streamlit_app.py)

  • Purpose: Interactive web-based dashboard
  • Architecture: Single-page app with tabbed interface
  • Data Caching: Optimized performance with Streamlit's caching
  • Deployment: Locally runnable web application

Features

Notebook Features

  • Data Loading & Cleaning: Handles CORD-19 metadata with proper missing value treatment
  • Time-based Analysis: Publication trends over time (2019-2024)
  • Journey Analysis: Top publishing journals and word frequency patterns
  • Visualizations: Line plots, bar charts, word clouds, frequency plots
  • Statistical Summaries: Descriptive statistics for numerical features

Application Features

  • Dashboard Layout: Multi-tab interface (Overview, Publications, Journals, Word Analysis)
  • Interactive Controls: Sliders for customizable data display
  • Metric Cards: Key statistics display (total papers, time range, word counts)
  • Dynamic Charts: Matplotlib and Streamlit built-in charts
  • Data Table Views: Scrollable tables for detailed examination

Usage

Prerequisites

  • Python 3.7+
  • Internet connection for dependency installation

Installation

  1. Create Virtual Environment:

    python -m venv venv
    venv\Scripts\activate  # Windows
  2. Install Dependencies:

    pip install -r requirements.txt

Data Preparation

  • Place metadata.csv file in the project root directory
  • Ensure CSV contains CORD-19 dataset columns (cord_uid, title, abstract, publish_time, authors, journal)

Running the Notebook

jupyter notebook COVID.ipynb

Process Flow:

  1. Import libraries
  2. Load and examine data structure
  3. Perform data cleaning
  4. Transform dates and add derived columns
  5. Generate analyses (publications by year, top journals, word frequencies)
  6. Create visualizations

Running the Streamlit App

streamlit run streamlit_app.py

Usage Guide:

  1. Overview Tab: View dataset statistics and sample publications
  2. Publications Tab: Analyze temporal trends and publication counts
  3. Journals Tab: Explore top publishing journals with customizable counts
  4. Word Analysis Tab: Examine title word frequencies and cloud visualizations

Interactive Elements

  • Adjust slider controls to view different numbers of results
  • Use multiselect filters for year-based analysis
  • Resize browser window for responsive layout

Data Insights

The applications reveal key patterns in COVID-19 research:

  • Accelerated publication growth from 2020 onwards
  • Medical and scientific journals as primary publishers
  • Thematic focus on COVID, SARS, viral topics in titles
  • Increasing publication volume reflecting pandemic urgency

Technical Notes

  • Dependencies: pandas, matplotlib, seaborn, streamlit, wordcloud
  • Data Size: Optimized for datasets up to 1M+ rows
  • Performance: Streamlit caching prevents redundant data processing
  • Compatibility: Tested on Windows 11 with Python 3.9-3.13

For detailed code documentation, see inline comments in respective files.

About

Python Week 8 Assignment Final Project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published