# Pandas notebook

## Getting started with Pandas

Start by loading the `pandas` library (with alias `pd`) then load the dataset `airfoil2.csv` using the `read_csv()` function; call the corresponding dataframe `df`.
Use the `head()` method to show how `df` looks like.

Note that this `read_csv()` function is very flexible and can accomodate all sorts of file. 
You will do this in much more details in module 2.
For now, we're giving you a nicely formatted dataset that directly works well with Pandas.

In [1]:
# import the library
import pandas as pd
# load the dataframe, use "head" to have a look
df = pd.read_csv('data/airfoil.csv')
df.head(3)
#df.shape
#df.dtypes
#print ("shape is {} ".format(df.shape) )


Unnamed: 0,Frequency [Hz],Angle [deg],Chord length [m],FS velocity [m/s],SSD thickness [m],Sound pressure [dB]
0,800,0.0,0.3048,71.3,0.002663,126.201
1,1000,0.0,0.3048,71.3,0.002663,125.201
2,1250,0.0,0.3048,71.3,0.002663,125.951


### Retrieving some basic informations 

Now that you have a DataFrame object `df`, you can explore the kind of information that is stored (beyond the actual data). Using the TAB completion you can get an idea for all the methods and attributes that you may want to use. 

Examples of useful attributes

* `shape` stores the dimensions of the data frame
* `columns` stores the names of the columns 
* `index` stores the names of the rows, by default pandas uses a range from 0 to the number of rows
* `dtypes` stores the `dtype` of each column

Show all of those, check it matches what you expected versus the output of `head` used earlier.

In [2]:
# add your code here to explore the meta-informations of df
series = df['Frequency [Hz]']
print(series)
val = series[10]
print(val)


0         800
1        1000
2        1250
3        1600
4        2000
5        2500
6        3150
7        4000
8        5000
9        6300
10       8000
11      10000
12      12500
13      16000
14        500
15        630
16        800
17       1000
18       1250
19       1600
20       2000
21       2500
22       3150
23       4000
24       5000
25       6300
26       8000
27      10000
28      12500
29        200
        ...  
1473      200
1474      250
1475      315
1476      400
1477      500
1478      630
1479      800
1480     1000
1481     1250
1482     1600
1483     2000
1484     2500
1485     3150
1486     4000
1487      200
1488      250
1489      315
1490      400
1491      500
1492      630
1493      800
1494     1000
1495     1250
1496     1600
1497     2000
1498     2500
1499     3150
1500     4000
1501     5000
1502     6300
Name: Frequency [Hz], Length: 1503, dtype: int64
8000


## Accessing elements in a dataframe

Let's get the 11th value of Frequency using several different approaches:

1. retrieve the series and then access the 10th value
1. using `loc`
1. (bonus) using `iloc`

In [3]:
# add your code here
print(df.loc[10])#location
val2 = df.loc[10, 'Frequency [Hz]']
print(val2)
df.iloc[10,0]#index locaiotn

Frequency [Hz]         8000.000000
Angle [deg]               0.000000
Chord length [m]          0.304800
FS velocity [m/s]        71.300000
SSD thickness [m]         0.002663
Sound pressure [dB]     117.151000
Name: 10, dtype: float64
8000


8000

### Using loc for fancy selections

Using `loc()`, can you retrieve the sub-dataframe with all the columns whose name has strictly more than 15 characters? Call this `df2`. Using `to_csv`, output this as a tab separated file (not comma) and call the file `airfoil2_2.dat`.

(Open the file in an editor to check it matches what you expect).

In [6]:
# add your code here
#df[['Frequency[Hz]', ]]

col_list= []
# for col in df.columns:
#   fi len(col)>15:
#      col_list.append(col)
    
    
col_list = df.loc[:,[c for c in df.columns if len(c)>15]]
#: the slice is instead take every row at the same time(instead of 2 for loops, one for rows one for col)

df.loc[:, col_list] 



KeyError: "['Frequency[Hz]'] not in index"

### Working with a Pd.Series

Retrieve the series corresponding to the sound pressure from the dataframe, display

* show the name of the series
* show the shape attribute of the series (does it correspond to what you expected?)
* the mean and the median
* the mean of the squared values

In [28]:
# add your code here
sp = df['Sound pressure [dB]']
sp.shape
#shape is an attribute

(1503,)

In [8]:
sp.mean
sp.median
#mean is a method

NameError: name 'sp' is not defined

In [30]:
sp.mean()

124.83594278110434

In [31]:
(sp*sp).mean() # or sp**2 use a double power

15631.572408916823