<a href="https://colab.research.google.com/github/Mahima966/Python_Case_Study/blob/main/PFizer_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**-What is the data about?**

* Temperature (K)
* Pressure (P)

The data is recorded after an interval of 1 hour everyday to monitor the drug stability in a drug development test.
These data points are therefore used to identify the optimal set of values of parameters for the stability of the drugs.


In [None]:
from google.colab import files
uploaded=files.upload()
print(uploaded)

# Let's explore this dataset -

In [None]:
import pandas as pd
import numpy as np

In [None]:
data = pd.read_csv('Pfizer_1.csv')
data

In [None]:
data.info()

As we saw earlier, the dataset has 18 rows and 15 columns.
######If you notice further, you'll see:
* The columns are 1:30:00 , 2:30:00 , 3:30:00 , ... so on.
* Temperature and Pressure of each date is in a separate row

**Q-Can we restructure our data into a better format?**
#####---Maybe we can have a column for time , with timestamps as the column value.



**Where will the Temperature/Pressure values go?**
######---We can similarly create one column containing the values of these parameters."Melt" the timestamp column into two columns** - timestamp and corresponding values.


**How can we restructure our data into having every row corresponding to a single reading?**

In [None]:
pd.melt(data, id_vars=['Date', 'Parameter', 'Drug_Name'])

This converts our data from wide to long format

**How can we rename the columns "variable" and "value" as per our original dataframe?**

In [None]:
data_melt = pd.melt(data,id_vars = ['Date', 'Drug_Name', 'Parameter'],
var_name = "time",
value_name = 'reading')
data_melt

**Conclusion:**
* The labels of the timestamp columns are conviniently melted into a single column - time
* It retained all the values in reading column.
* The labels of columns such as 1:30:00 , 2:30:00 have now become categories of the variable column.
* The values from columns we are melting are stored in the value column.


# Pivoting
######Now suppose we want to convert our data back to the wide format.
######The reason could be to maintain the structure for storing or some other purpose.
# Notice
* The variables Date , Drug_Name and Parameter will remain same.
* The column names will be extracted from the column time .
* The values will be extracted from the column readings .



**How can we restructure our data back to the original wide format?**

In [None]:
data_melt.pivot(index=['Date','Drug_Name','Parameter'], # Columns used to make new frame’s index
columns = 'time',                                       # Column used to make new frame’s columns
values='reading')                                        # Column used for populating new frame’s values.


 pivot() is the exact opposite of melt() .


* We are getting multiple indices here, but we can get single index again using reset_index() .


In [None]:
data_melt.pivot(index=['Date','Drug_Name','Parameter'],
columns = 'time',
values='reading').reset_index()

In [None]:
data_melt.head()

Now if you notice,
* We are using 2 rows to log readings for a single experiment.

**Can we further restructure our data into dividing the Parameter column into T/P?**
* A format like Date | time | Drug_Name | Pressure | Temperature would be suitable.
* We want to split one single column into multiple columns.


**How can we divide the Parameter column again?**

In [None]:
data_tidy = data_melt.pivot(index=['Date','time', 'Drug_Name'],
columns = 'Parameter',values='reading')
data_tidy


Notice that a multi-index dataframe has been created.
#####We can use reset_index() to remove the multi-index.

In [None]:
data_tidy = data_tidy.reset_index()
data_tidy


In [None]:
#We can rename our index column from Parameter to simply None .
data_tidy.columns.name = None
data_tidy.head()


**Can we use pivot to nd the day-wise mean value of temperature for each drug?**

In [None]:
data_tidy.pivot(index=['Drug_Name'],
columns = 'Date',
values=['Temperature'])

**Why did we get an error?**

* We need to nd the average of temperature values throughout a day.
* If you notice, the error shows duplicate entries.

Hence the index values should be unique entry for each row.


**What can we do to get our required mean values then?**


In [None]:
pd.pivot_table(data_tidy, index='Drug_Name', columns='Date', values=['Temperature'], aggfunc=np.mean)

This function is similar to pivot() , with an extra feature of an aggregator

**How does pivot_table() work?**


* The initial parameters are same as what we use in pivot() .
* As an extra parameter, we pass the type of aggregator.

**Note:**
* We could have done this using groupby too.
* In fact, pivot_table uses groupby in the backend to group the data and perform the aggregration.
* The only difference is in the type of output we get using both the functions.


**Similarly, what if we want to nd the minimum values of temperature and pressure on a particular date?**

In [None]:
pd.pivot_table(data_tidy, index='Drug_Name', columns='Date', values=['Temperature', 'Pressure'], aggfunc=np.min)