# Learning Python

This is the comphrensive companion notebook for our Analytical Ascent: Learning Python blog series which can be found on our website at the link [here](https://analyticalascent.com/python-for-beginners-data-science/). The sections that follow have Part numbers that correspond to the associated blog post to simplify referencing back and forth.


## Learning Python: Part 5 - Variables and Loops

### Variables: Basic units of storage.

In [1]:
# Setting the x variable equal to 10
x = 10

# Setting the first_name variable equal to Alice
first_name = "Alice"

In [2]:
print(x)

10


In [3]:
print(first_name)

Alice


### Data Types in Python

#### Integers: Whole numbers without a decimal point.

In [4]:
age = 25

#### Floats: Numbers with a decimal point.

In [5]:
temperature = 98.6

#### Strings: Sequances of characters enclosed in quotes.

In [6]:
greeting = "Hello, World!"

#### Lists: Ordered collections of items, which can be of different data types.

In [7]:
fruits = ["apple", "banana", "cherry"]

#### Dictionaries: Collections of key-value pairs.

In [8]:
person = {"name": "Alice", "age":30}

#### Sets: Unordered collections of unique items

In [9]:
unique_numbers = {1, 2, 3, 4, 5}

#### Booleans: True or False values.

In [10]:
is_student = True

### Boolean Variables and Operators

In [11]:
is_raining = True
is_sunny = False

# Boolean operators
print(is_raining and is_sunny) # False
print(is_raining or is_sunny) # True
print(not is_raining) # False


False
True
False


### While Loop

In [12]:
count = 0
while count < 5:
    print("Counting:", count)
    count += 1 #increment count by 1

Counting: 0
Counting: 1
Counting: 2
Counting: 3
Counting: 4


### For Loop

In [13]:
fruits = ["apple", "banana", "cherry"]
for fruit in fruits:
    print(fruit)

apple
banana
cherry


### If Statement

In [14]:
x = 10
if x > 5:
    print("x is greater than 5")
else:
    print("x is 5 or less")

x is greater than 5


### Code Indentation in Python

In [15]:
# Each level of indentation represents a new block of code. Consistency in using either spaces or tabs for indentation is important to avoid errors.
if x > 5:
    print("This is indented")
    if x > 10:
        print("This is further indented")

This is indented


## Learning Python - Part 6: Functions and Modules

### Defining and Using Functions

In [1]:
# Define a function using the 'def' keywork, followed by the function name and parentheses.
def greet(name):
    """This function greets the person whose name is passed as an argument."""
    print(f"Hello, {name}!")

In [2]:
# Calling the function
greet("Alice")

Hello, Alice!


In [3]:
# Calling the function with a different name argument
greet("Bob")

Hello, Bob!


### Importing and Using Modules

In [4]:
# Importing a standard Python library module
import math

# Using functions from the math module
print(math.sqrt(16))
print (math.pi)

4.0
3.141592653589793


### Writing Your First Python Function

In [5]:
# A simple Python function to calculate the factorial of a number.
# The factorial of a non-negative integer n is the product of all prositive integers less than or equal to n.
def factorial(n):
    """This function returns the factorial of a given number."""
    if n == 0:
        return 1
    else:
        return n * factorial(n - 1)

# Calling the function
print(factorial(5))

120


### Creating and Importing Your Own Module

Learning_Python_Part_6_My_Module.py has been included as an additional file and it contains two functions:
* add: returns the sum of two numbers.
* subtract: returns the difference of two numbers.

## Learning Python: Part 7 - Introduction to Libraries NumPy and Pandas

### Importing libraries

To start using NumPy and Pandas, you first need to import them into your Python script. Here's how you can do that:

In [1]:
import numpy as np
import pandas as pd

### Basic Usage of NumPy

NumPy (Numerical Python) is a library used for numerical computations. It provides support for arrays, matrices, and many mathematical functions. NumPy is essential for performing numerical calculations in Python, especially in fields like data science, machine learning, and scientific computing.

Let's create a simple array and perform basic operations using NumPy:

In [2]:
# Creating an array
arr = np.array([1,2,3,4,5])

# Performing operations
print("Array: ", arr)
print ("Sum: ", np.sum(arr))
print("Mean: ", np.mean(arr))
print("Standard Deviation: ", np.std(arr))

Array:  [1 2 3 4 5]
Sum:  15
Mean:  3.0
Standard Deviation:  1.4142135623730951


### Basic Usage of Pandas

Pandas is a powerful library for data manipulation and analysis. It provides data structures like Series and DataFrame, which make it easy to handle structured data. With Pandas, you can perform operations like filtering, grouping, and merging datasets, making it an indispensable tool for data analysis.

Now, let's create a DataFrame and perform some basic operations using Pandas:

In [3]:
# Creating a DataFrame
data = {'Name': ['Alison', 'Bob', 'Charlie'],
        'Age': [24, 27, 22],
        'City': ['New York', 'Orlando', 'Chicago']}

df = pd.DataFrame(data)

# Displaying the DataFrame
print("DataFrame:\n", df)

# Basic operations
print("Mean Age:", df['Age'].mean())
print("Cities:\n", df['City'].unique())

DataFrame:
       Name  Age      City
0   Alison   24  New York
1      Bob   27   Orlando
2  Charlie   22   Chicago
Mean Age: 24.333333333333332
Cities:
 ['New York' 'Orlando' 'Chicago']


## Learning Python: Part 8 - Working with Data: NumPy Basics

### Creating and Manipulating Arrays

NumPy arrays are more efficient than Python lists for numerical operations.

In [1]:
# Import Numpy
import numpy as np

In [2]:
# Create a NumPy array
array = np.array([1, 2, 3, 4, 5])
print("Array: ", array)

Array:  [1 2 3 4 5]


### Basic Array Operations

NumPy allows for a variety of operations on arrays, including mathematical computations and statistical operations.

In [5]:
# Array operations
array1 = np.array([1, 2, 3])
array2 = np.array([4, 5, 6])

# Addition
sum_array = array1 + array2
print("Sum: ", sum_array)

# Mean
mean_value = np.mean(array1)
print("Mean: ", mean_value)

# Reshape
reshaped_array = array1.reshape((3, 1))
print("Reshaped Array:\n", reshaped_array)

Sum:  [5 7 9]
Mean:  2.0
Reshaped Array:
 [[1]
 [2]
 [3]]


### Practical Examples with NumPy

Suppose you need to analyze a dataset of temperatures:

In [8]:
temps = np.array([22.5, 24.0, 22.7, 24.3, 27.3])
average_temp = np.mean(temps)
print("Average Temperature: ", average_temp)

Average Temperature:  24.16


## Learning Python: Part 9 - Working with Data: Pandas Basics

In [1]:
# Import Pandas
import pandas as pd

### Introduction to Series and DataFrames

Series and DataFrames are the two main data structures that are introduced by Pandas. A Series is a one-dimensional labeled array, while a DataFrame is a two-dimensional labeled data structure.

In [2]:
# Create a Series
series = pd.Series([1, 2, 3, 4, 5])
print("Series:\n", series)

# Create a DataFrame
data = {'Name': ['Amanda', 'Bill', 'Carol'],
        'Age': [25, 30, 35]}

df = pd.DataFrame(data)
print("DataFrame:\n", df)

Series:
 0    1
1    2
2    3
3    4
4    5
dtype: int64
DataFrame:
      Name  Age
0  Amanda   25
1    Bill   30
2   Carol   35


### Reading Data from CSV or Excel Files

Pandas makes it easy to read data from various file formats such as CSV and Excel.

In [17]:
# Read from CSV 
favorite_colors_df = pd.read_csv('Learning_Python_Part_9_Example_Colors.csv')

# Read from Excel
survey_df = pd.read_excel('Learning_Python_Part_9_Example_Survey.xlsx')

### Basic Manipulation with Pandas

Pandas provides powerful tools for data manipulation, including filtering, grouping, and summarizing data.

In [10]:
# First, let's take a look at the first 3 rows of the favorite_colors_df
favorite_colors_df.head(3)

Unnamed: 0,id,name,age,favoriteColor
0,1234,Bobby,30,Blue
1,2345,Susan,16,Orange
2,3456,Mirjana,29,Yellow


In [11]:
# Now, let's examine the last 5 rows of the same df by using tail instead of head
favorite_colors_df.tail() # 5 is the default value if no value is provided

Unnamed: 0,id,name,age,favoriteColor
5,6789,Gabrielle,21,Pink
6,7890,Gunter,11,Teal
7,8901,Woodrow,88,Red
8,9012,Tom,64,Blue
9,9999,Abigail,14,Purple


In [14]:
# Example of filtering data
filtered_color_df = favorite_colors_df[favorite_colors_df['age'] > 20]
print("Filtered DataFrame:\n", filtered_color_df)

Filtered DataFrame:
      id       name  age favoriteColor
0  1234      Bobby   30          Blue
2  3456    Mirjana   29        Yellow
3  4567     Nikola   64           Red
4  5678      Louis   52     Turquoise
5  6789  Gabrielle   21          Pink
7  8901    Woodrow   88           Red
8  9012        Tom   64          Blue


In [16]:
# Example of grouping data
grouped_color_df = favorite_colors_df.groupby('favoriteColor').size()
print("Grouped DataFrame:\n", grouped_color_df)

Grouped DataFrame:
 favoriteColor
Blue         2
Orange       1
Pink         1
Purple       1
Red          2
Teal         1
Turquoise    1
Yellow       1
dtype: int64


### Melt and Pivot with Survey Data

Pandas' melt and pivot functions are useful for reshaping data. Here's an example using the survey data we read in earlier via the excel file.

In [18]:
# Examine the data
survey_df.head()

Unnamed: 0,ID,Name,Q1,Q2,Q3,Q4,Q5
0,1000,Bogdan,1,5,2,4,4
1,1001,Alejandro,3,4,4,5,4
2,1002,Will,4,3,5,2,3
3,1003,Anastasia,2,2,2,4,5
4,1004,Henrietta,5,3,1,3,1


In [20]:
# Melt the DataFrame from wide to long format
melted_survey_df = pd.melt(survey_df, id_vars=['ID','Name'], var_name='Question', value_name = 'Score')
print("Melted DataFrame:\n",melted_survey_df)

# Pivot the DataFrame back from long to wide format
pivoted_survey_df = melted_survey_df.pivot(index=['ID', 'Name'], columns='Question', values = 'Score')
print("Pivoted DataFrame:\n", pivoted_survey_df)


Melted DataFrame:
       ID       Name Question  Score
0   1000     Bogdan       Q1      1
1   1001  Alejandro       Q1      3
2   1002       Will       Q1      4
3   1003  Anastasia       Q1      2
4   1004  Henrietta       Q1      5
..   ...        ...      ...    ...
70  1010       Ozan       Q5      4
71  1011    Alberto       Q5      5
72  1012     Darius       Q5      4
73  1013   Penelope       Q5      5
74  1014   Katerina       Q5      4

[75 rows x 4 columns]
Pivoted DataFrame:
 Question        Q1  Q2  Q3  Q4  Q5
ID   Name                         
1000 Bogdan      1   5   2   4   4
1001 Alejandro   3   4   4   5   4
1002 Will        4   3   5   2   3
1003 Anastasia   2   2   2   4   5
1004 Henrietta   5   3   1   3   1
1005 James       5   5   3   5   2
1006 Barbara     4   4   5   4   5
1007 Beatrice    1   5   4   5   4
1008 Amira       2   5   2   4   3
1009 Boris       3   5   4   5   5
1010 Ozan        3   5   5   1   4
1011 Alberto     3   5   3   5   5
1012 Darius     