# Python Refresher:

This will be a small repository where I'll be saving examples of useful Python methods/functions often used by Data Scientists/Analysts in real world scenarios.
It will contain methods/functions from various libraries such as Matplotlib, NumPy and Pandas.

In [None]:
#Import the required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

#Importing the csv files
homelessness = pd.read_csv('homelessness.csv')
cars = pd.read_csv('cars.csv')

#Print first 5 elems
print(homelessness.head())

### Subsetting categorical variables is a breeze by using the .isin method

In [28]:
#Trying the .isin method
canu = ['California', 'Arizona', 'Nevada', 'Utah']

#Filter for Rows in the Mojave Desert states
mojave_states = homelessness[homelessness['state'].isin(canu)]

#print
print(mojave_states)

      region       state  individuals  family_members  state_pop
2   Mountain     Arizona       7259.0          2606.0    7158024
4    Pacific  California     109008.0         20964.0   39461588
28  Mountain      Nevada       7058.0           486.0    3027341
44  Mountain        Utah       1904.0           972.0    3153550


### Methods to loop through a dictionary and/or Numpy array, and a Pandas DataFrame/Series 

- The items method (for dictionary loops)

In [None]:
dict = {
    'name': 'Carlos',
    'age': 27
}

for key, val in dict.items() :
    print(f'The key is {key} and the value is {val}')

- The nditer function (for numpy arrays, both 1D and 2D)

In [None]:
my_array = np.array([(1,2,3,4,5),(6,7,8,9,10)])

for val in np.nditer(my_array) :
    print(val)

- Iterating through a Panda's DataFrame by using the iterrows method

In [None]:
print(cars.head())
for lab, row in cars.iterrows() :
    print(lab)
    print(row)

- We can also select variables from the DataFrame by using square brackets

In [None]:
for lab, row in cars.iterrows() :
    print(f"Label {lab}: is {row['cars_per_cap']}")

- We can also use the iterrows method to add new columns 

In [None]:
for lab, row in cars.iterrows() :
    cars.loc[lab, 'COUNTRY'] = row['country'].upper()

print(cars)

## Other useful stuff

How to count ocurrences in NumPy arrays

In [None]:
array = np.array([1,2,3,4,5])

np.count_nonzero(array == 2)
np.count_nonzero((array == 2) | (array == 3))

## Loc and Iloc practice

In [None]:
print(homelessness.info())
print(homelessness[['region', 'individuals']])
print(homelessness.loc[7, 'region'])

In [None]:
print(homelessness.head(8))

#Since we are dealing with DataFrames, we are required to use the following syntax: [[]]
print(homelessness.loc[[1,7], ['state', 'region']])
print(homelessness.iloc[:, [1,2,3]])

#Getting single values
print(homelessness.loc[0, 'state'])

#We can also get values without the previous methods by using square breackets
print(homelessness['region'][0])

### ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

# Summary Statistics

There are multiple methods that can be used when trying to get a bit of information on a DataFrame before getting deeper into it.

In [None]:
#We have the .head method, which gives us the first 5 rows of a DataFrame (we can set a specific limit as a parameter)
head = cars.head(7)
print(head)

#The .info method is also very useful, it gives us the columns of a specific DataFrame
columns = cars.info()
print(columns)

#The .describe method gives us a nice summarized statistic piece of info
summary = cars.describe()
print(summary)

#We also got 'standalone' .mean and .median methods
mean = cars['cars_per_cap'].mean()
median = cars['cars_per_cap'].median()
print(f'Mean is {mean} and Median is {median}')