Skip to content

MarcoN16/Data_Visualization_Project

Repository files navigation

Project Overview

This script analyzes data obtained from a pharmaceutical company specializing in anti-cancer medications, focusing on screening potential treatments for squamous cell carcinoma (SCC), a common form of skin cancer. The dataset comprises information from their recent animal study involving 249 mice identified with SCC tumors, treated with various drug regimens over a 45-day period. The study's primary aim was to assess the effectiveness of Pymaceuticals' drug of interest, Capomulin, in comparison to other treatment regimens. The script generates tables and figures essential for the technical report of this clinical study.

Preparation of Data

  1. Data Import and Merging: Initially, the script imports and merges two datasets, mouse_metadata and study_results, into a unified DataFrame.

  2. Duplicate Time Points Handling: The script identifies and addresses any mouse ID instances with duplicate time points. The associated data is displayed, and a new DataFrame excluding this data is created, ensuring a clean dataset for subsequent analysis.

Summary Statistics Generation

The script computes summary statistics, consisting of:

• Rows for each drug regimen.

• Columns for statistical measures like mean, median, variance, standard deviation, and SEM (Standard Error of the Mean) of the tumor volume.

Summary_Statistics

Bar Charts and Pie Charts Creation

  1. Bar Charts: Two sets of bar charts are generated, displaying the total number of timepoints for each drug regimen throughout the study period. The first chart utilizes the Pandas DataFrame.plot() method. The second chart is created using Matplotlib's pyplot methods.

bar_chart_pandas bar_chart_pyplot

  1. Pie Charts: Two pie charts illustrate the distribution of female versus male mice in the study. The first pie chart employs the Pandas DataFrame.plot() method. The second pie chart is generated through Matplotlib's pyplot methods.

pie_chart_pandas pie_chart_pandas

Quartiles, Outliers Identification, and Box Plot Generation

  1. Quartile Calculation and Outlier Identification: Analysis focuses on determining the final tumor volume for mice under four prominent treatment regimens: Capomulin, Ramicane, Infubinol, and Ceftamin. Quartiles, Interquartile Range (IQR), and potential outliers across these regimens are computed.

  2. Box Plot Creation: Using Matplotlib, a box plot displays the distribution of final tumor volume for mice in each treatment group. Any potential outliers are highlighted by altering their color and style.

Boxplot_pyplot

Line Plot and Scatter Plot Creation

  1. Line Plot: A line plot showcasing the tumor volume versus time point for a specific mouse treated with Capomulin is generated.

  2. Scatter Plot: A scatter plot depicting mouse weight versus the average observed tumor volume for the complete Capomulin treatment regimen is produced.

Example_mouse_treated_Capomulin Scatter_plot

Correlation and Regression Analysis

  1. Correlation and Regression: The script concludes by calculating the correlation coefficient and developing a linear regression model to explore the relationship between mouse weight and the average observed tumor volume throughout the Capomulin treatment regimen.

Linear_correlation

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors