### Pivot
**```df.pivot(index=None, columns=None, values=None)```**<br>
**```pandas.pivot(data, index=None, columns=None, values=None)```**

Where,
- `index`: Column to use as new dataframe's index. If None, uses existing index.
- `columns`: Column to use to make new dataframe columns.
- `values`:  Column(s) to use for populating new frame's values. 
**``index and columns both are required arguments in pivot``**

### Pivot_table
**`df.pivot_table(index=None, columns=None, values=None, aggfunc= 'mean', fill_valus=None, sort=True, margins=False)`**<br>
**`pandas.pivot_table(data, index=None, columns=None, values=None, aggfunc= 'mean', fill_valus=None, sort=True, margins=False)`**

Where,
- `index`: Column to use as new dataframe's index. If None, uses existing index.
- `columns`: Column to use to make new dataframe columns.
- `values`:  Column(s) to use for populating new frame's values. 
- `aggfunc`:  default is numpy.mean
- `fill_value`: Value to replace missing values with

**`` here pivot_table can work only with one argument either with index or column**``

### df_melt
- Similar to `pivot()` and `pivot_table()`, Pandas `melt()` method is also used to transform or reshape data. 
- The `pd.melt()` method is used to change the DataFrame format from **wide to long**
- The Pandas `pd.melt()` method is useful to reshape a DataFrame into a format where one or more columns are identifier variables (id_vars), while all other columns, considered measured variables (value_vars). Its signature is:

**pandas.melt(Dataframe, id_vars=None, value_vars=None, var_name=None, value_name='value', ignore_index=True)**

Where,
- `id_vars`: tuple, list, or ndarray, optional  (Column(s) to use as identifier variables)
- `value_vars`: tuple, list, or ndarray, optional (If not specified, uses all columns that are not set as id_vars)
- `var_name`: Name to use for the ‘variable’ column. If None it uses **frame.columns.name** or **‘variable’**.
- `value_name`: Name to use for the ‘value’ column.
- `ignore_index`: bool, default True (If True, original index is ignored. If False, the original index is retained.)


### cross tab
- The `pd.crosstab()` method is also used for data restructuring and reshaping.
- It is normally used for **quickly comparing categorical variables.**
- The cross table is also known as **contingency table**, which is a matrix type table that displays the (multivariate) **frequency distribution** of variables.
```
pandas.crosstab(index, 
                columns, 
                aggfunc=None,
                values=None,
                margins=False, 
                normalize=False)
```
Where,
- `index`: array-like, Series, or list of arrays/Series (Values to group by in the rows)
- `columns`: array-like, Series, or list of arrays/Series (Values to group by in the columns)
- `values`: array-like, optional (Array of values to aggregate according to the factors. always Requires aggfunc be specified)
- `aggfunc`: function, optional If specified, requires values be specified as well.
- `margins`: bool, default False, Add row/column margins (subtotals).
- `normalize`: bool, {‘all’, ‘index’, ‘columns’}, or {0,1}, default False (Normalize by dividing all values by the sum of total values, basically calculate the percentage of these values)

In [None]:
#import pandas as pd
#df = pd.read_csv('datasets/pivot_weather1.csv')
#df
#now both work same here
#df.pivot(index="city", columns="date") #pivot takes atleast two arguments index and columns
#df.pivot(index="city") #it will give an error on this one argument
#df.pivot(columns="city") #it will also give an error because here is one argument
#df.pivot_table(index="city") #pivot_table can also take one argument
#df.pivot_table(columns="city") #pivot_table can also take one argument
                                #it will drop the columns which is not able for aggregation

#by default these will show values in sorted form according to there index and also columns which we give 
#because by default  sort=True in pivot_table it also work for pivot but one difference is
#check this difference by commenting uncommenting these 2 below lines

#df.pivot(index="city", columns="date") #here you see temperature col comes first it dont sort it
#df.pivot_table(index="city", columns="date") #here you see humidity col comes first it also sort remaining colmns

#df.pivot(index="city", columns="date", values="temperature")
#df.pivot_table(index="city", columns="date", values="temperature")

# we can also calculate aggregate in this way
#df['temperature'].agg('mean')
#df[df['city'] == 'Lahore'].temperature.agg('mean')

#import pandas as pd
#df = pd.read_csv('datasets/pivot_weather2.csv')

#df1 = df.pivot(index='date', columns='city') #it will give error duplicate values
#df1
#df1 = df.pivot_table(index='date', columns='city') #it will calculate the mean/average of same values and return
#df1

#df1 = df.pivot_table(index='date', columns='city', aggfunc='prod') #default value of aggfunc is mean, you can also pass any other aggregate function name,.
                                                                    #like sum, min, max, prod, median, std (standard deviation), describe, count
#filling missing values
#import pandas as pd
#df = pd.read_csv('datasets/pivot_weather2_missing_values.csv') #temp and humidity of murree is missing on 20 date
#df
#df.pivot_table(index='temperature', columns='city')
#df.pivot_table(index='temperature', columns='city', fill_value=0)
    
#import pandas as pd
#df = pd.read_csv('datasets/pivot_std1.csv')

#df.pivot_table(index='gender', columns='sport' ,margins=True) #by default aggfunc is mean therefor it show all means values
#df.pivot_table(index='gender', columns='sport', aggfunc='sum',margins=True) #it shows all sum values
#df.pivot_table(index='gender', columns='sport', aggfunc='prod',margins=True)    


#MELT

#df = pd.read_csv('datasets/weather.csv')
#df
#pd.melt(df, id_vars =['day']) #it will show longer values for all values
#pd.melt(df, id_vars =['day'], value_vars=['karachi'],var_name='city', value_name='temperature') #only for karachi

# You can achieve the similar result by using Boolean indexing
#df1[df1['city'] == 'karachi' ]

# compute the average temperature of karachi city only
#df1[df1['city'] == 'karachi' ].temperature.agg('mean')

#cross tab

# Reading data from 'datasets/sample.csv' file
#import numpy as np
#import pandas as pd
#df = pd.read_csv('datasets/sample1.csv')
#df
#pd.crosstab(index=df.city, columns=df.gender) #look here it use df.city, df.gender # and it calculates it frequency distributions of male and females in each city
#pd.crosstab(index=df.city, columns=df.gender, margins=True) #it will give also subtotals
#pd.crosstab(index=df.city, columns=df.gender,normalize=True) #it will give the result in percentage by dividing each value by total male + female = 14
#pd.crosstab(index=df.city, columns=df.gender, margins= True,normalize=True)
#pd.crosstab(index=df.city, columns=df.gender, values=df.age, aggfunc=np.mean) # always values comes with aggfunc, otherwise it will give error
#pd.crosstab(index=df.city, columns=df.gender, values=df.age) #it will give error