Skip to content

JethroKimande/PLP-Python-WK-8

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CORD-19 Data Analysis and Streamlit App

This project analyzes the CORD-19 metadata dataset and presents findings through a Streamlit web application.

Overview

The CORD-19 dataset contains information about COVID-19 research papers. This analysis focuses on:

  • Publication trends over time
  • Top publishing journals
  • Word frequency in paper titles
  • Distribution by data source

Dataset

The metadata.csv file from the CORD-19 dataset includes:

  • Paper titles and abstracts
  • Publication dates
  • Authors and journals
  • Source information

Download from: https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge

Installation

  1. Clone this repository
  2. Create a virtual environment: python -m venv .venv
  3. Activate the environment: .venv\Scripts\activate (Windows)
  4. Install dependencies: pip install -r requirements.txt

Usage

Jupyter Notebook

Run the cord19_analysis.ipynb notebook for data exploration and analysis.

Streamlit App

Run the Streamlit app: streamlit run app.py

The app allows interactive filtering by year range and data source.

Findings

  • Most COVID-19 papers were published in 2020
  • Top journals include various medical and scientific publications
  • Common words in titles: covid, coronavirus, sars, etc.
  • Sources are primarily from PMC, bioRxiv, etc.

Challenges

  • Handling large dataset size
  • Dealing with missing values in abstracts
  • Parsing dates in various formats

Technologies Used

  • Python
  • Pandas
  • Matplotlib
  • Seaborn
  • Streamlit
  • WordCloud

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published