# Read/Save csv data files using Pandas

We can use the Python package Pandas to <br>
(1) read and save data files. <br>
(2) visualize data, and the visualization functions in Pandas are built on Matplotlib.  <br>
(3) combine different data files (dataframes) into one file (dataframe). <br>
<br>
Pandas uses a data type ``dataframe`` to represent a table, which is an enhanced version of NumPy array (usually 2D).
<br>
A table can be stored in a ``dataframe``. <br>
Each column of the dataframe has an index (a string or an integer or other object)  <br>
Each row of the dataframe has an index (a string or an integer or other object)  <br>
<br>
Pandas has another data type  ``Series``  , which is an enhanced version of 1D NumPy array. <br>
Each element in a ``Series`` has an index (a string or an integer or other object)

In [None]:
import numpy as np
import pandas as pd

## Series Data Object in Pandas

In [None]:
data = pd.Series([0.1, 0.2, 0.3, 0.4])
data

In [None]:
type(data)

We can get an element of a ``Series``  using its index

In [None]:
data[0]

We can get a sub-series using element indexes in a ``Series``

In [None]:
a = data[0:2] # type(a) is pandas.core.series.Series
a

We can convert ``Series`` into a 1D NumPy array using the function/method ``.values``

In [None]:
a = data.values # type(a) is numpy.ndarray
a

### ``Series`` is similar to Python Dictionary and NumPy array

Each element in a ``Series`` has an index (usually, a string-index or an integer-index) <br>
We can acess an element using an integer-index : similar to NumPy Array <br>
We can acess an element using a string-index : similar to Python Dictionary

In [None]:
data = pd.Series([0.1, 0.2, 0.3, 0.4], index=['a', 'b', 'c', 'd'])
data

A ``Series`` has an attribute ``index``, which is an array-like object

In [None]:
# get the string-index of each element
[data.index[0], data.index[1], data.index[2], data.index[3]]

Get the element using the string-index

In [None]:
data['b']

Get the element using the integer-index

In [None]:
data[1]

We can use non-contiguous indexes in a ``Series``

In [None]:
data = pd.Series([0.1, 0.2, 0.3, 0.4], index=[-1, 100, 2, 3])
data

In [None]:
data[-1] # it is not the last element

In [None]:
data[-1:101] # this is weird, do not use this notation to get a sub-series

In [None]:
data1 = pd.Series([0.1, 0.2, 0.3, 0.4]) # we do not specify index here
# it is the same as
data2 = pd.Series([0.1, 0.2, 0.3, 0.4], index = [0, 1, 2, 3]) # indexes are contiguous from 0

In [None]:
data1

In [None]:
data2

### Create a Series from a Python Dictionary

In [None]:
patient_info = {'Age': 20,
                'Blood_Type': 'O',
                'sex': 'M',
                'Address': 'Base0, Mars',
                'Phone': '001001001',
                'Diagnosis': 'bone fracture in foot'}
#
patient_info = pd.Series(patient_info)
print(patient_info)
print('type(patient_info) is', type(patient_info))

In [None]:
patient_info[4]

In [None]:
patient_info['Phone']

In [None]:
patient_info[0:4] # patient_info[4]/['Phone'] is not included

``Series`` supports slicing using strings as the start index and the end index

In [None]:
patient_info['Age':'Phone']
# ['Phone'] is included: this is inconsistent with the above integer-index notation

## Dataframe Object in Pandas 
Dataframe is usually used to represent a table <br>
The value of a table is a matrix (2D NumOy Array) <br>
Each row of the table has an index (usually, a string-index or an integer-index) <br>
Each column of the table has an index (usually, a string-index or an integer-index) <br>

In [None]:
Matrix = [[1, 2],
          [3, 4],
          [5, 6]]

In [None]:
df = pd.DataFrame(Matrix, columns=['ColumnA', 'ColumnB'], index=['RowA', 'RowB', 'RowC']) 
print('type(df)', type(df))
df

In [None]:
type(df)

In [None]:
df.columns

In [None]:
df.index

In [None]:
# get the first column using its identifier/name
df['ColumnA']

In [None]:
# get the first row by its identifier/name ???
df['RowA'] # this is wrong

get the first row using ```df.iloc``` with integer-index

In [None]:
df.iloc[0,:]

In [None]:
type(df.iloc[0,:])

get the first column using df.iloc with integer-index

In [None]:
df.iloc[:,0]

In [None]:
type(df.iloc[:,0])

get an element in the Dataframe using df.iloc with integer-indexes

In [None]:
df.iloc[0,1]

In [None]:
df

## Convert a Dataframe to a Numpy Array using ``Dataframe.values``

In [None]:
A = df.values
A

In [None]:
type(A)

# Load data from a csv  file

a csv file contains comma-separated values (CSV) <br>
https://en.wikipedia.org/wiki/Comma-separated_values

In [None]:
df = pd.read_csv('patient_record.csv', sep=',') # in the file the numbers are seperated by ,
df

In [None]:
df.columns

In [None]:
df.index

In [None]:
#convert the dataframe to a numpy array
data=df.values
data

We convert M to 0 and convert F to 1 to get a numeric array

In [None]:
data[np.where(data=='M')]=0
data[np.where(data=='F')]=1
data

### chage the data type  from 'object' to 'float64'

In [None]:
data=data.astype('float64')
data

# Process the data
assume after brain surgeries, the tumors of the male patients have been removed <br>

In [None]:
data_new = data.copy()
# assume we can use one line of code to remove the tumors of the male patients
data_new[:,2]=data_new[:,1]*data_new[:,2]
data_new

In [None]:
data_new = data_new.astype('object') # change the data type from float64 to object (to store str object)
data_new[:,1][np.where(data_new[:,2]==0)]='M'
data_new[:,1][np.where(data_new[:,2]>0)]='F'
data_new[:,0]=np.int64(data_new[:,0])
data_new

# Save the data to a csv file

In [None]:
#create a new Dataframe using data_new and the original column/row indexes
df_new = pd.DataFrame(data_new, columns=df.columns, index=df.index) 
df_new

In [None]:
#save the new Dataframe df_new to a csv file
#set index=False, so the row indexes will not be saved  
df_new.to_csv('patient_record_new.csv', index=False, sep=',')