# Matplotlib - The Power of Plots 

As a senior data analyst at Pymaceuticals Inc., I have been given access to the complete data from their recent animal study. 

249 mice have been identified with SCC (Squamous Cell Carcinoma) , a commonly occuring form of skin cancer. They were treated with a variety of drug regiments. Over the course of 45 days, tumor development was observed and measured. 

This study was used to compare the performance of Pymaceuticals' drug of interest, Capomulin, versus the other treatment regimens and to generate all of the tables and figures needed for the technical report of the study. Also a summary of the study results. 

### Dependencies and Setup

In [1]:

import matplotlib.pyplot as plt
import pandas as pd
import scipy.stats as st
import os

# Study data files
mouse_metadata_path = os.path.join(".", "data", "Mouse_metadata.csv")
study_results_path = os.path.join(".", "data", "Study_results.csv")

# Read the mouse data and the study results
mouse_metadata = pd.read_csv(mouse_metadata_path)
study_results = pd.read_csv(study_results_path)

# Combine the data into a single dataset
mouseid_data = pd.merge(mouse_metadata, study_results, how='outer', on=['Mouse ID'])

# Display the data table for preview
mouseid_data.head(40)


Unnamed: 0,Mouse ID,Drug Regimen,Sex,Age_months,Weight (g),Timepoint,Tumor Volume (mm3),Metastatic Sites
0,k403,Ramicane,Male,21,16,0,45.0,0
1,k403,Ramicane,Male,21,16,5,38.825898,0
2,k403,Ramicane,Male,21,16,10,35.014271,1
3,k403,Ramicane,Male,21,16,15,34.223992,1
4,k403,Ramicane,Male,21,16,20,32.997729,1
5,k403,Ramicane,Male,21,16,25,33.464577,1
6,k403,Ramicane,Male,21,16,30,31.099498,1
7,k403,Ramicane,Male,21,16,35,26.546993,1
8,k403,Ramicane,Male,21,16,40,24.365505,1
9,k403,Ramicane,Male,21,16,45,22.050126,1


### Checking the number of mice.

In [2]:
mouseid_data['Mouse ID'].value_counts()

g989    13
l509    10
e584    10
s185    10
n304    10
        ..
d133     1
x226     1
f932     1
o848     1
u153     1
Name: Mouse ID, Length: 249, dtype: int64

### Getting the duplicate mice by ID number that shows up for Mouse ID and Timepoint. 

In [6]:
duplicate_mouseid = mouseid_data[mouseid_data.duplicated(['Mouse ID', 'Timepoint'])]
print(duplicate_mouseid)

    Mouse ID Drug Regimen     Sex  Age_months  Weight (g)  Timepoint  \
909     g989     Propriva  Female          21          26          0   
911     g989     Propriva  Female          21          26          5   
913     g989     Propriva  Female          21          26         10   
915     g989     Propriva  Female          21          26         15   
917     g989     Propriva  Female          21          26         20   

     Tumor Volume (mm3)  Metastatic Sites  
909           45.000000                 0  
911           47.570392                 0  
913           49.880528                 0  
915           53.442020                 0  
917           54.657650                 1  


 ### Create a clean DataFrame by dropping the duplicate mouse by its ID.

In [5]:
clean_data = mouseid_data.loc[mouseid_data["Mouse ID"] != "g989", :]
clean_data.head()

Unnamed: 0,Mouse ID,Drug Regimen,Sex,Age_months,Weight (g),Timepoint,Tumor Volume (mm3),Metastatic Sites
0,k403,Ramicane,Male,21,16,0,45.0,0
1,k403,Ramicane,Male,21,16,5,38.825898,0
2,k403,Ramicane,Male,21,16,10,35.014271,1
3,k403,Ramicane,Male,21,16,15,34.223992,1
4,k403,Ramicane,Male,21,16,20,32.997729,1


### Checking the number of mice in the clean DataFrame.

In [7]:
clean_data['Mouse ID'].value_counts()

x402    10
g316    10
t198    10
v991    10
r604    10
        ..
t573     1
l872     1
h428     1
f932     1
b447     1
Name: Mouse ID, Length: 248, dtype: int64