# Outline of data analysis with Pandas:
***


1. Importing data (from FIJI results table or any other source)
*** 
2. Previewing Data
    * Head and tail
    * Data validation
    * Statistics of data set
        * *Min, max, standard deviation, etc.*
***    
3. Working with Data
    * loc vs iloc
    * Selecting columns
        * *Statistics on column*
    * Adding data to data frame
    * Performing operations on a column
***
4. Quick plotting for data preview
***

### Importing data
Python has built in functions for importing csv files, which we can use here to read the data in as lists
***

In [None]:
# Read the csv file the using Python built in functions
import csv
areas_list = []  #initialize an empty list to put the row numbers in
with open('Results_random_circles_CoronaTime.csv') as csvfile:
    readCSV = csv.reader(csvfile, delimiter=',')
    for row in readCSV:
        #print(row)
        #print(row[0])
        print(row[0], row[1])
        areas_list.append(row[1])
        
        
        


In [None]:
#look at just the areas.  How do we slice to get rid of the first entry?


In [None]:
#What happens if I try to add a number to the first entry?


In [None]:
#The list of areas is a list of strings, we need to change it to a list of floating point numbers


#Calculate the max of just the areas 



In [None]:
areas

#### Excerise

Calculate the min of the areas

---

#### Use the statistics package to determine the standard deviation of the areas

In [None]:
#Calculate the standard deviation of the areas

from statistics import stdev


### We can easily create a dataframe to hold the imported csv data from a file using Pandas and numpy instead of reading in using csv read, just a shortcut to get a dataframe
***

In [None]:
#Input csv file as dataframe
import pandas as pd
import numpy as np

#Read in data as data frame
df_old= pd.read_csv('Results_random_circles_CoronaTime.csv',skiprows=range(21,101))  #range to skip
df_old.head()

### When we import the dataframe, it generates an index to the rows, so we can select them. 
What does setting the *index_col=0* flag do?  What if we change it to other columns?
 ***

In [None]:
df = pd.read_csv('Results_random_circles_CoronaTime.csv', skiprows=range(21,101), index_col = 0)
#Whats the difference between the two indicies?
df.head()

## We can find out information about the dataframe as a whole using pandas functions:
***

In [None]:
df.info()  #find out information about the whole dataset, (i.e. how many objects there are in each column)

In [None]:
df.shape  #find the number of rows and columns

### Excercises, find the min, max and standard deviation of the areas of the dataframe columns
***

In [None]:
#Find the min, max, mean area of the dataset



In [None]:
#Find the max



In [None]:
#Find the mean


In [None]:
#Find the standard deviation of the dataset


In [None]:
 #find out all the stats in one go 

### Showing a single row or single value: loc vs iloc
***


In [None]:
df.head()

In [None]:
 #loc looks for a particular label on an index

In [None]:
#iloc is integer loc, looks at the specific POSITION of the index.


## Selecting a single column using indexing
* Single and double square bracket indexing
***

In [None]:
df.head()

In [None]:
#view the first few rows of area, comes out as a series

In [None]:
#view the first few rows of area, as a dataframe




## Working with the data, statistics of a column
#https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html

In [None]:
df[['Area']].describe()

In [None]:
df.Area.describe()

In [None]:
 #Finding the column min using a dataframe approach

In [None]:
  #Finding the column min using a series approach

### Excercise: Find the min, max and mean of the area column, and set them equal to area_max, area_min and area_mean and make an output print statement saying what they are

In [None]:
#Find the min, max and mean of the area column, and set them equal to area_max, area_min and area_mean and make an 
#output print statement saying what they are

area_min = df['Area'].min()


print('The minimum area of a found circle is:', area_min, 'square pixels')





### Formatting numbers

You can format a number using format spec, https://pyformat.info/  
"{:4.1f}" gives us at least four characters, with one after the decimal point.  



In [None]:
#example
test_number = 1234.54927  #generate a test number
test_number

In [None]:
"{:1.3f}".format(test_number)  #if we change the number before the decimal point, we end up with more white space

In [None]:
# Format the numbers to significant figures
print('The minimum area of a found circle is:',"{:1.1f}".format(area_min), 'square pixels')
print('The maximum area of a found circle is:', "{:1.1f}".format(area_max), 'square pixels')
print('The mean area of a found circle is:', "{:1.1f}".format(area_mean), 'square pixels')


## Calculations on a column
### Can we find the radii of the shapes?
***

In [None]:
#Can we find the radii of the shapes?
#recall area=pi*r^2
#r=sqrt(area/pi)

# This makes a new list of just the radii


#

#### Perform a mathematical operation and add a column to a dataframe

In [None]:
#


### Removing columns from dataframe
***

In [None]:
#Delete the column for radius?


In [None]:
df.head()

In [None]:
#Can we add the radius to the data frame directly?


### Excercise, compute the diameter and add a diameter column to the data frame
***

In [None]:
#Excercise, compute the diameter and add a diameter column to the data frame, then look at the first five rows of the dataframe


### Excercise, find the mean and standard deviation of the radii.  How do the compare to what we put in to our sample image?
***

In [None]:
#Find the mean and standard deviation of the radii, compare to the inputs



### Use the same syntax to find just the standard deviation of the radii
***

In [None]:
#find the standard deviation


# Quick plot for seeing what we have
***

In [None]:
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
