This script analyzes data obtained from a pharmaceutical company specializing in anti-cancer medications, focusing on screening potential treatments for squamous cell carcinoma (SCC), a common form of skin cancer. The dataset comprises information from their recent animal study involving 249 mice identified with SCC tumors, treated with various drug regimens over a 45-day period. The study's primary aim was to assess the effectiveness of Pymaceuticals' drug of interest, Capomulin, in comparison to other treatment regimens. The script generates tables and figures essential for the technical report of this clinical study.
-
Data Import and Merging: Initially, the script imports and merges two datasets, mouse_metadata and study_results, into a unified DataFrame.
-
Duplicate Time Points Handling: The script identifies and addresses any mouse ID instances with duplicate time points. The associated data is displayed, and a new DataFrame excluding this data is created, ensuring a clean dataset for subsequent analysis.
The script computes summary statistics, consisting of:
• Rows for each drug regimen.
• Columns for statistical measures like mean, median, variance, standard deviation, and SEM (Standard Error of the Mean) of the tumor volume.
- Bar Charts: Two sets of bar charts are generated, displaying the total number of timepoints for each drug regimen throughout the study period. The first chart utilizes the Pandas DataFrame.plot() method. The second chart is created using Matplotlib's pyplot methods.
- Pie Charts: Two pie charts illustrate the distribution of female versus male mice in the study. The first pie chart employs the Pandas DataFrame.plot() method. The second pie chart is generated through Matplotlib's pyplot methods.
-
Quartile Calculation and Outlier Identification: Analysis focuses on determining the final tumor volume for mice under four prominent treatment regimens: Capomulin, Ramicane, Infubinol, and Ceftamin. Quartiles, Interquartile Range (IQR), and potential outliers across these regimens are computed.
-
Box Plot Creation: Using Matplotlib, a box plot displays the distribution of final tumor volume for mice in each treatment group. Any potential outliers are highlighted by altering their color and style.
-
Line Plot: A line plot showcasing the tumor volume versus time point for a specific mouse treated with Capomulin is generated.
-
Scatter Plot: A scatter plot depicting mouse weight versus the average observed tumor volume for the complete Capomulin treatment regimen is produced.
- Correlation and Regression: The script concludes by calculating the correlation coefficient and developing a linear regression model to explore the relationship between mouse weight and the average observed tumor volume throughout the Capomulin treatment regimen.








