In [20]:
import numpy as np
import pandas as pd

## Dictionaries 
###  (a.k.a. `Dicts`)

My post in website :[Python Dictionaries: Key-Value Pair Mapping](https://pudasainimohan.com.np/post/python_dictionary/)     
Making `Dicts`:

| Code | Output | Description |
| :--  | :--    | :---        |
| `x = {'a': 3, 'b': 4}` | `{'a': 3, 'b': 4}`  | Make a `dict` using curly braces and colons |
| `dict([('a', 3), ('b', 4)])` | `{'a': 3, 'b': 4}` | Transform a list of pairs of data into a `dict` |

.

Some Dict Methods:

| Function Syntax | Method Syntax | Example Output | Description |
| :---  | :--- | :--- | :--- |
| `dict.keys(x)` | `x.keys()` | `dict_keys(['a', 'b'])` | The keys of the dictionary (these are like the "indices" of the dict) |
| `dict.values(x)` | `x.values()` | `dict_values([3, 4])` | The values of the dictionary (the data you can extract from the dict) |
| `dict.items(x)`  | `x.items()` | `dict_items([('a', 3), ('b', 4)])` | The item pairs in the dictionary |



**Exercises**: Let's explore some dictionaries and get a feel for them.

Make a `birthdays` dictionary for your three friends, so that you can look up their names and get their birthday:
  - Rajesh's birthday is Jan. 4th
  - Amit's birthday is May 25th
  - Krishna's birthday is Dec. 12th

In [1]:
birthdays = {'Rajesh': 'Dec. 1st', 'Amit': 'May 4th', 'Krishna': 'July 23rd'}
birthdays

{'Rajesh': 'Dec. 1st', 'Amit': 'May 4th', 'Krishna': 'July 23rd'}

Use the `birthdays` dict above to do the exercises.

What is Rajesh's birthday?

In [3]:
birthdays['Rajesh']

'Dec. 1st'

Whose birthday is on Dec. 1st?

In [5]:
b_item=birthdays.items()
friends= dict(reversed(i) for i in b_item )
friends

{'Dec. 1st': 'Rajesh', 'May 4th': 'Amit', 'July 23rd': 'Krishna'}

What are the `values` in the birthdays dict?

In [6]:
birthdays.values()

dict_values(['Dec. 1st', 'May 4th', 'July 23rd'])

What are the `keys` in the birthdays dict?

In [7]:
birthdays.keys()

dict_keys(['Rajesh', 'Amit', 'Krishna'])

What are the `items` in the birthdays dict?

In [8]:
birthdays.items()

dict_items([('Rajesh', 'Dec. 1st'), ('Amit', 'May 4th'), ('Krishna', 'July 23rd')])

Make a dict from this list of tuples:

In [9]:
brightnesses = [
    ('red', 65), 
    ('green', 3), 
    ('blue', 10)]

In [10]:
dict(brightnesses)

{'red': 65, 'green': 3, 'blue': 10}

## Analysing Data stored in Dicts

The challenge with analyzing `dict` data is that dicts are not "sequences", and neither are `dict_keys()` or `dict_values()`, so before putting them into a statistics function we should first turn `dict_values()` into a `list` using the `list()` function. For example:

```python
>>> data = {'x': 1, 'y': 2}

>>> data.values()
dict_values([1, 2])

>>> list(data.values())
[1, 2]

>>> np.mean(list(data.values()))
1.5
```

Useful Functions for the below Exercises:

| Function | Example | Description |
| :----  | :----   | :---- |
| `len()` | `len(the_dict)` | The total number of items |
| `np.mean()` | `np.mean(list(the_dict.values())` | The mean of the dict's values |
| `np.min()` | `np.min(list(the_dict.values()))` | The minimum of the dict's values |

In [15]:
import numpy as np

**Exercises**: Let's get some practice querying dicts and calculating some statistics on dicts using Numpy.

Using the following dict, calculate what was the average hours of sleep that our friends got last night:

In [12]:
hours_of_sleep = {'Ram': 5, 'Sita': 9, 'Gopal': 7, 'Rita': 6, 'Mohan': 8}

In [16]:
#np.mean(list(hours_of_sleep.values()))
np.mean(list(hours_of_sleep.values()))



7.0

How many total people in the following dataset were in our sleep study?

In [17]:
len(hours_of_sleep)

5

In [18]:
!pip show pandas

Name: pandas
Version: 1.5.3
Summary: Powerful data structures for data analysis, time series, and statistics
Home-page: https://pandas.pydata.org
Author: The Pandas Development Team
Author-email: pandas-dev@python.org
License: BSD-3-Clause
Location: c:\users\custo\anaconda3\lib\site-packages
Requires: numpy, python-dateutil, pytz
Required-by: datashader, holoviews, hvplot, pyreadstat, seaborn, statsmodels, xarray


In [21]:
import pandas as pd

# Pandas DataFrames

## What is a DataFrame?

A DataFrame, is a **Table** of data.  It is a structure that contains multiple rows, each row containing the same labelled collection of data types.  A **DataFrame** is a table with named rows (called the "index"). For example, a DataFrame might look like this:

| (index) | Name | Age | Height | LikesIceCream |
| :---: | :--: | :--: | :--: | :--: |
| 0     | "Ram" | 22 |5.3 | True |
| 1     | "Sita" | 55 | 5 | True |
| 2     | "Hari"  | 25 | 5.8 | True |

Because each row contains the same data, DataFrames can also be thought of as a collection of same-length columns!

**Pandas** is a Python package that has a DataFrame class.  many **read_()** functions, you can make your own DataFrame from a variety of sources.  

## Making DataFrames Directly

### Examples of Different Ways

#### From a List of Dicts

Dicts are named collections.  If you have many of the same dicts in a list, the DataFrame constructor can convert it to a Dataframe:

In [24]:
friends = [
    {'Name': "Ram", "Age": 31, "Height": 5.3, "Weight": 60},
    {'Name': "Sita", "Age": 55, "Height": 5},
    {"Name": "Hari", "Height": 5.8, "Age": 25 },
]
pd.DataFrame(friends)

Unnamed: 0,Name,Age,Height,Weight
0,Ram,31,5.3,60.0
1,Sita,55,5.0,
2,Hari,25,5.8,


#### From a Dict of Lists

In [25]:
df = pd.DataFrame({
    'Name': ['Ram', 'Sita', 'Hari'], 
    'Age': [31, 55, 25], 
    'Height': [5.3, 5, 5.8],
})

df

Unnamed: 0,Name,Age,Height
0,Ram,31,5.3
1,Sita,55,5.0
2,Hari,25,5.8


#### From a List of Lists

if you have a collection of same-length sequences, you essentially have a rectangular data structure already!  All that's needed is to add some column labels.

In [27]:
friends = [
    ['Ram', 31, 5.3],
    ['Sita', 55, 5],
    ['Hari',  25, 5.8]
]
pd.DataFrame(friends, columns=["Name", "Age", "Height"])

Unnamed: 0,Name,Age,Height
0,Ram,31,5.3
1,Sita,55,5.0
2,Hari,25,5.8


#### From an empty DataFrame
If you prefer, you can also add columns one at a time, starting with an empty DataFrame:

In [33]:
df = pd.DataFrame()
df['Name'] = ['Ram', 'Sita', 'Hari']
df['Age'] = [31, 55, 25]
df['Height'] = [5.3, 5, 5.8]
df


Unnamed: 0,Name,Age,Height
0,Ram,31,5.3
1,Sita,55,5.0
2,Hari,25,5.8


**Exercises**: Making DataFrames from Scratch

Please use Pandas to recreate the table here as a Dataframe using one of the approaches detailed above:

| Year | Product | Cost |
| :--: | :----:  | :--: |
| 2015 | Apples  | 0.35 |
| 2016 | Apples  | 0.45 |
| 2015 | Bananas | 0.75 |
| 2016 | Bananas | 1.10 |

In [32]:
df = pd.DataFrame()
df['Year'] = [2015,2016,2015,2016]
df['Product'] = ['Apples', 'Apples', 'Bananas', 'Bananas']
df['Cost'] = [0.35, 0.45, 0.75, 1.10]
df


Unnamed: 0,Year,Product,Cost
0,2015,Apples,0.35
1,2016,Apples,0.45
2,2015,Bananas,0.75
3,2016,Bananas,1.1


### Reading Data from Files into a DataFrame


| File Format | File Extension | `read_xxx()` function | Dataframe Write Method | 
| :--:  | :--: | :--: | :--: |
| Comma-Seperated Values      | .csv           | `pd.read_csv()` | `df.to_csv()` |
| Tab-seperated Values       | .tsv, .tabular, .csv | `pd.read_csv(sep='\t')`, `pd.read_table()` | `df.to_csv(sep='\t')` `df.to_table()` |
| Excel Spreadsheet           |  .xls | `pd.read_excel()`                    | `df.to_excel()`  |
| Excel Spreadsheet           | .xlsx | `pd.read_excel()`   | `df.to_excel()` |
| STATA                        | .dta | `pd.read_stata()`                     | `df.to_stata()` |


In [None]:
import pandas as pd

### Understanding Different File Formats


**Exercises**: 

run the code below to download the Titanic passengers dataset, and transform it into different file formats

*Note*: you can supply a web url and pandas reads it like a normal file!

In [34]:
url = 'https://raw.githubusercontent.com/pudasainimohan/Materials/main/data/titanic.csv'
df = pd.read_csv(url)
df

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,deck,embark_town,alone
0,0,3,male,22.0,1,0,7.2500,S,,Southampton,False
1,1,1,female,38.0,1,0,71.2833,C,C,Cherbourg,False
2,1,3,female,26.0,0,0,7.9250,S,,Southampton,True
3,1,1,female,35.0,1,0,53.1000,S,C,Southampton,False
4,0,3,male,35.0,0,0,8.0500,S,,Southampton,True
...,...,...,...,...,...,...,...,...,...,...,...
886,0,2,male,27.0,0,0,13.0000,S,,Southampton,True
887,1,1,female,19.0,0,0,30.0000,S,B,Southampton,True
888,0,3,female,,1,2,23.4500,S,,Southampton,False
889,1,1,male,26.0,0,0,30.0000,C,C,Cherbourg,True


**Data Dictionary**

`survived` : whether the passenger survived or not       
`pclass` : the passenger's ticket class (first, second, or third)              
`sex` : the passenger's gender (male or female)       
`age` : the passenger's age       
`sibsp` : the number of siblings/spouses the passenger had aboard       
`parch` : the number of parents/children the passenger had aboard       
`fare` : the fare paid by the passenger for their ticket          
`embarked` : the port of embarkation (Southampton, Cherbourg, or Queenstown)       
`deck` : the deck on which the passenger's cabin was located (A to G)        
`alone` : whether the passenger was traveling alone or with family members      

In [35]:
pd.set_option('display.max_rows',5)
df #set display options

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,deck,embark_town,alone
0,0,3,male,22.0,1,0,7.2500,S,,Southampton,False
1,1,1,female,38.0,1,0,71.2833,C,C,Cherbourg,False
...,...,...,...,...,...,...,...,...,...,...,...
889,1,1,male,26.0,0,0,30.0000,C,C,Cherbourg,True
890,0,3,male,32.0,0,0,7.7500,Q,,Queenstown,True


Now run the code below to save the file to a comma-seperated file using the `DataFrame.to_csv()` method, then use a text editor to examine the file that was saved on the computer.  How is the file structured?

In [37]:
df.to_csv("titanic.csv", index=False)

Now read the file back into Pyhton using the `pd.read_csv()` function:

In [41]:
pd.read_csv('./data/titanic.csv')

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,deck,embark_town,alone
0,0,3,male,22.0,1,0,7.2500,S,,Southampton,False
1,1,1,female,38.0,1,0,71.2833,C,C,Cherbourg,False
...,...,...,...,...,...,...,...,...,...,...,...
889,1,1,male,26.0,0,0,30.0000,C,C,Cherbourg,True
890,0,3,male,32.0,0,0,7.7500,Q,,Queenstown,True


**Excel**

Save the dataframe to an Excel file.  `DataFrame.to_excel()`

In [42]:
df.to_excel('titanic.xlsx', index=False)

Read the Excel file into Pandas again, using the `pd.read_excel()` function

In [43]:
pd.read_excel('titanic.xlsx')

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,deck,embark_town,alone
0,0,3,male,22.0,1,0,7.2500,S,,Southampton,False
1,1,1,female,38.0,1,0,71.2833,C,C,Cherbourg,False
...,...,...,...,...,...,...,...,...,...,...,...
889,1,1,male,26.0,0,0,30.0000,C,C,Cherbourg,True
890,0,3,male,32.0,0,0,7.7500,Q,,Queenstown,True


**STATA**

Save the dataframe to a STATA file using the `df.to_stata()` method.

In [45]:
df.to_stata('titanic.dta' )

Read the stata file into Pandas again, using the `pd.read_stata()` function.

In [46]:
pd.read_stata('titanic.dta')

Unnamed: 0,index,survived,pclass,sex,age,sibsp,parch,fare,embarked,deck,embark_town,alone
0,0,0,3,male,22.0,1,0,7.2500,S,,Southampton,0
1,1,1,1,female,38.0,1,0,71.2833,C,C,Cherbourg,0
...,...,...,...,...,...,...,...,...,...,...,...,...
889,889,1,1,male,26.0,0,0,30.0000,C,C,Cherbourg,1
890,890,0,3,male,32.0,0,0,7.7500,Q,,Queenstown,1


**SPSS**

To save a pandas DataFrame in SPSS format, there is no direct method in pandas. You can use the `pyreadstat` module to write the DataFrame to an SPSS file using the `write_sav` function from that module. First, make sure that you have installed the pyreadstat module using `pip install pyreadstat`.

In [47]:
!pip install pyreadstat



In [48]:
import pyreadstat

In [49]:
pyreadstat.write_sav(df,'titanic.sav')