<img src="https://i.ibb.co/TcVMz44/logo.jpg" alt="logo" border="0" width=200>

# Computational Astrophysics 2021
---
## Eduard Larrañaga

Observatorio Astronómico Nacional\
Facultad de Ciencias\
Universidad Nacional de Colombia

---

## 02. Reading .csv files

### About this notebook

In this worksheet we use the `pandas` package to read a dataset given in a .csv file. 

---

### Reading the data

Sometimes, the dataset is given in the format of a .csv file. For example, consider the dataset reported by Greene and Ho (2006), containing the features of 88 galaxies. 

Greene, J. E. and Ho, L. C. *The MBH − σ∗ Relation in Local Active Galaxies*. ApJ 641 L21 (2006)
https://ui.adsabs.harvard.edu/abs/2006ApJ...641L..21G/abstract

We give here a .csv version of this data set.

---

### Open the .dat+ReadMe files.

Since the dataset is a .csv file, we use pandas to read the file and take a look to the first elements. Using the function `read_csv` we assign the contents of the file to the variable df, wich will be called the **dataframe**. 

Detailed information on this function can be found at

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html

In [None]:
path='' #Define an empty string to use in case of local working

In [None]:
# Working with google colab needs to mount the Google Drive
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
# we define the path to the files
path = '/content/drive/MyDrive/Colab Notebooks/CA2021/02. Astrophysics Data/presentation/02.CSVFiles/'

In [None]:
import numpy as np
import pandas as pd

In [None]:
df = pd.read_csv(path+'data.csv')

To take a look of the first elements in the dataframe, we use the attribute `.head()`

In [None]:
df.head()

Unnamed: 0,Name,z,sigma*,e_sigma*,n_sigma*,FWHM,e_FWHM,logL,e_logL,logM,E_logM,e_logM
0,SDSS J000805.62+145023.4,0.0454,140.0,27.0,,7610.0,380.0,41.13,0.04,7.7,,0.1
1,SDSS J004236.86-104921.8,0.0419,78.4,10.0,,1960.0,97.0,41.58,0.14,6.7,,0.1
2,SDSS J011703.58+000027.3,0.0456,98.8,16.0,,2270.0,110.0,41.45,0.08,6.8,,0.1
3,SDSS J020459.25-080816.0,0.0772,121.0,9.4,a,3720.0,180.0,41.13,0.05,7.0,,0.1
4,SDSS J020615.99-001729.1,0.0426,216.0,30.0,,3860.0,190.0,41.91,0.07,7.5,,0.1


The `.describe()` attribute summarize the content of the dataset.

In [None]:
df.describe()

Unnamed: 0,z,sigma*,e_sigma*,FWHM,e_FWHM,logL,e_logL,logM,E_logM,e_logM
count,88.0,88.0,88.0,71.0,71.0,71.0,71.0,88.0,15.0,88.0
mean,0.048665,117.142045,11.805682,3206.056338,210.760563,41.504225,0.078028,6.86625,0.140667,0.189886
std,0.032562,48.285108,5.308383,1759.679743,191.219953,0.663268,0.0417,0.72825,0.074303,0.17247
min,0.000947,30.0,2.9,810.0,41.0,40.1,0.03,4.9,0.02,0.02
25%,0.02775,87.025,7.75,1905.0,110.0,41.155,0.05,6.3,0.1,0.1
50%,0.04225,113.5,12.0,2970.0,160.0,41.51,0.07,7.0,0.12,0.1
75%,0.0622,139.25,15.0,3870.0,210.0,41.86,0.09,7.4075,0.17,0.2
max,0.184,268.0,30.0,8240.0,1190.0,43.61,0.2,8.52,0.31,1.06


The dataframe includes data from 88 samples. There can be seen the mean of the features, the standard deviation and the minimum and maximum values.

It is also possible to see some specific samples or even one single sample or column.

In [None]:
df[5:15]

Unnamed: 0,Name,z,sigma*,e_sigma*,n_sigma*,FWHM,e_FWHM,logL,e_logL,logM,E_logM,e_logM
5,SDSS J021011.49-090335.5,0.0414,122.0,12.0,a,2450.0,120.0,41.24,0.07,6.7,,0.1
6,SDSS J021257.59+140610.1,0.0618,174.0,12.0,a,3080.0,150.0,41.58,0.06,7.1,,0.1
7,SDSS J033013.26-053236.0,0.0131,99.2,11.0,,5160.0,250.0,40.45,0.06,7.0,,0.1
8,SDSS J075057.25+353037.5,0.176,154.0,14.0,a,2970.0,200.0,41.63,0.03,7.2,,0.1
9,SDSS J080243.39+310403.3,0.0409,151.0,17.0,,5360.0,260.0,41.67,0.07,7.6,,0.1
10,SDSS J080538.66+261005.4,0.017,100.0,14.0,,3110.0,150.0,40.14,0.04,6.3,,0.2
11,SDSS J082510.23+375919.7,0.0214,98.7,12.0,,1830.0,91.0,40.42,0.04,6.0,,0.1
12,SDSS J083202.16+461425.7,0.0459,104.0,13.0,,1450.0,72.0,41.17,0.08,6.2,,0.1
13,SDSS J083949.64+484701.4,0.0394,133.0,12.0,,1480.0,73.0,41.31,0.08,6.3,,0.2
14,SDSS J085554.27+005110.9,0.0524,118.0,11.0,a,2910.0,140.0,41.27,0.06,6.9,,0.1


In [None]:
df[16:17]

Unnamed: 0,Name,z,sigma*,e_sigma*,n_sigma*,FWHM,e_FWHM,logL,e_logL,logM,E_logM,e_logM
16,SDSS J093259.60+040506.0,0.059,70.5,6.7,a,3550.0,170.0,41.14,0.05,7.0,,0.1


In [None]:
df['z']

0     0.045400
1     0.041900
2     0.045600
3     0.077200
4     0.042600
        ...   
83    0.003320
84    0.017200
85    0.016300
86    0.021800
87    0.000947
Name: z, Length: 88, dtype: float64

In [None]:
df[['z', 'sigma*']]

Unnamed: 0,z,sigma*
0,0.045400,140.0
1,0.041900,78.4
2,0.045600,98.8
3,0.077200,121.0
4,0.042600,216.0
...,...,...
83,0.003320,96.8
84,0.017200,198.0
85,0.016300,133.0
86,0.021800,36.0
