Skip to content

TinashePisira/Week-8-Python-Assignment-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 

Repository files navigation

CORD-19 Data Explorer This project provides a Python script for analyzing and visualizing a subset of the COVID-19 Open Research Dataset (CORD-19). The script is divided into two main parts: a command-line analysis workflow and a a web-based data explorer using the Streamlit framework.

Features Data Loading: Automatically downloads and loads the CORD-19 metadata.

Data Cleaning: Handles missing values and prepares the data for analysis.

Data Analysis:

Counts papers by publication year.

Identifies the top publishing journals.

Finds the most frequent words in paper titles.

Visualizations: Generates various plots to visualize key insights, including a word cloud of titles, a bar chart of publications over time, and a chart of the top journals.

Interactive Web App: A Streamlit application that allows users to explore the data with an interactive slider for filtering by publication year.

Getting Started Prerequisites To run the script, you will need to have Python installed on your system along with the following libraries:

pandas

requests

matplotlib

wordcloud

streamlit

You can install these dependencies using pip:

pip install pandas requests matplotlib wordcloud streamlit

How to Run

  1. Command-Line Analysis The script contains a main execution block that, when run directly, will perform the data loading, cleaning, and visualization steps, displaying the plots in separate windows.

To run this part of the script, save the code as cord_19_analysis.py and execute it from your terminal:

python cord_19_analysis.py

The script will print analysis outputs to the console and display the generated plots.

  1. Streamlit Web App To run the interactive web application, you must first have Streamlit installed. The run_streamlit_app() function is designed to be the entry point for the app.

To launch the app, save the code as app.py and run the following command in your terminal:

streamlit run app.py

This will open a new tab in your web browser with the interactive CORD-19 Data Explorer.

Code Structure The script is organized into logical parts to make it easy to understand and modify:

load_data(): Responsible for fetching the dataset.

clean_data(): Cleans and prepares the DataFrame.

analyze_and_visualize(): Performs the core analysis and generates static plots.

run_streamlit_app(): Contains the full code for the Streamlit web application.

if name == 'main':: The main block that controls which part of the script is executed.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages