# Book 1: How to read a csv file into a data table

Here we will go over how to use ```pandas``` to read ```.csv``` files into panda´s  **data_frames**, and how to make simple statistics

## Simple math with just Python

We will create 2 numeric variables ```a``` and ```b``` and then ```print``` their sum.

In [1]:
a = 3
b = 5

print(a+b)

8


Lest play a bit with strings

In [2]:
str_1 = "I am "
str_2 = "having fun"

# this works very much similar to IJ Macros
print(str_1 + str_2)

# so called f-strings are a cool way of controlling better what info you print
print(f"{str_1} able to do basic math, and I am {str_2}")

I am having fun
I am  able to do basic math, and I am having fun


As you can se above, using numeric data and strings in Python is simple and is familar to the way we write in IJ macros. In fact IJ has a language called Jython, which you can use it to program your macros. 

So why to learn Python? because it is order of magnitudes better than IJ macros when doing data science operations, such as, statistics, data cleaning, modeling and plotting. The purpose of this module is just to show you how you can start working with Python and we take as an example how to load ```.csv``` files into Python and work with them.

## Using Pandas to read csv files into data tables

Now we will use a well known package called [pandas](https://pandas.pydata.org/) to load ```.csv``` files into data tables called **data frames**. To install ```pandas``` you can follow this [link](https://anaconda.org/conda-forge/pandas). But basically:

* Go to the Anaconda Prompt
* activate your conda environment:

```
> conda activate bias-env
```
* run the command: ```conda install -c conda-forge pandas```
* say yes via ```y```

In [3]:
# by using the import command I make the pandas library available to me, the "as pd" is
# a good practice. This way if I wan to use a function/method from pandas I always start
# by pd.
import pandas as pd 

# load csv file into a data_frame, think of it as a table of spread sheet
df1 = pd.read_csv('./data/Results_01.csv')

# there is a nice command for you to explore what is inside a data_frame, its called head.
df1.head()

Unnamed: 0,Unnamed: 1,Label,Area,Perim.,Circ.,AR,Round,Solidity
0,1,blobs.gif,425,90.42641,0.65314,2.0667,0.48386,0.88542
1,2,blobs.gif,181,55.2132,0.74611,1.77749,0.56259,0.94517
2,3,blobs.gif,656,96.52691,0.88474,1.06472,0.93921,0.9697
3,4,blobs.gif,430,79.1127,0.86335,1.06156,0.94201,0.95662
4,5,blobs.gif,477,86.04163,0.80968,1.56805,0.63773,0.96657


## dtypes:
Tells you if your table was loaded "properly"

object -> string or mixed

int64 -> integer

float64 -> float

bool -> logical true/false

In [4]:
df1.dtypes

              int64
Label        object
Area          int64
Perim.      float64
Circ.       float64
AR          float64
Round       float64
Solidity    float64
dtype: object

# Getting basic statistics

Now that we have tabular data, then we can do classical operations like calculating the mean value of a particular column. See the example below but also type your own so you see the power of auto completion.

In [5]:
# very basic way of gettng the mean
df1["Area"].mean()

np.float64(355.4754098360656)

Now if you want to have better control when printing values I recommend that you get familiar with f-strings. It is a very nice way of controlling how you print the information to the user. Below an exampe:

In [6]:
# Let us store the mean value into a variable called val
val = df1["Area"].mean()
# then I can use a f-string to print the value, and also format the way it will be printed
# the format is indicated by the text after the :, in this example :.3f means than only
# 3 numbers will be printed after the decimal point
print(f"The mean value of Area is: {val:.3f}.")

The mean value of Area is: 355.475.


We can also get a simple summary of statistics

In [7]:
df1["Area"].describe()

count     61.000000
mean     355.475410
std      208.090173
min       14.000000
25%      202.000000
50%      368.000000
75%      500.000000
max      886.000000
Name: Area, dtype: float64