<div class="licence">
<span>Licence CC BY-NC-ND</span>
<span>Valérie Roy</span>
<span><img src="media/ensmp-25-alpha.png" /></span>
</div>

In [None]:
import numpy as np
import pandas as pd

## V) Importing data in pandas

### 1) formats of files

   - **pandas** can **import** files of **a lot of formats**
      - CSV, JSON, HTML, Excel, ...
   - see http://pandas.pydata.org/pandas-docs/stable/user_guide/io.html

### 2) reading and writing **csv** files (comma separated values)

   - to write a **csv** use the **method** $\texttt{pandas.DataFrame.to_csv}$

In [None]:
distance = pd.Series([0.387, 0.723, 30, 1., 5.203, 1.523, 9.6, 19.19],
                     index=['Mercury', 'Venus', 'Neptune', 'Earth', 'Jupiter', 'Mars', 'Saturn', 'Uranus'])

lowest_temp = pd.Series([-200.0, 446.0,  -90.0, -125.0, -140.0],
                        index=['Mercury', 'Venus', 'Earth', 'Jupiter', 'Mars'])

highest_temp = pd.Series([430.0, 490.0, 60.0, 17.0, 20.0],
                         index=['Mercury', 'Venus', 'Earth', 'Jupiter', 'Mars'])

planets = pd.DataFrame({'distance': distance,
                        'lowest temperature': lowest_temp, 
                        'highest temperature': highest_temp, 
                        'origin':'solar system'})

In [None]:
planets

In [None]:
planets.index

In [None]:
planets.to_csv('planets.csv', index_label='names', float_format='%.3f')

   - a file **planets.csv** has been **created** in your current folder
   - we gave a **name** to the **rows** index

   - the csv **format** is very **simple**: a $2 \times 2$ matrix, where:
   - by default, the **first** line is the **columns** header (**labels** if any, else **indexes**)
   - the **other** lines are **rows** written **one below the other** with values **separed by** ','

**planets.csv**
   - *names,distance,lowest temperature,highest temperature,origin  
Earth,1.0,-90.0,60.0,solar system  
Jupiter,5.203,-125.0,17.0,solar system  
Mars,1.523,-140.0,20.0,solar system  
Mercury,0.387,-200.0,430.0,solar system  
Neptune,30.0,,,solar system  
Saturn,9.6,,,solar system  
Uranus,19.19,,,solar system  
Venus,0.723,446.0,490.0,solar system*

   - to **read** a **csv** use the **method** $\texttt{pandas.DataFrame.read_csv}$

In [None]:
df = pd.read_csv('planets.csv')

In [None]:
df = df.set_index('names')       # the rows 'names' is the index

#### digression:
   - you can see a **general floating point problem** $5.523$ became $5.5230000000000001$ when printed by $\texttt{to_csv}$
   - https://github.com/pandas-dev/pandas/issues/17154

In [None]:
planets.loc['Mars', 'distance'], df.loc['Mars', 'distance']

In [None]:
df.loc['Mars', 'distance']  == planets.loc['Mars', 'distance']

In [None]:
np.isclose(df.loc['Mars', 'distance'], planets.loc['Mars', 'distance'])

*trying to get exact equality out of floating points is generally a losing battle*

*let's go back to the course*

#### the method $\texttt{pandas.DataFrame.read_csv}$
   - has many optional **parameters** that you can **set**
   - see the help

In [None]:
#pd.read_csv?

In [None]:
#pd.DataFrame.to_csv?