# Introduction in Python

Python is a high-level, interpreted, interactive and object-oriented scripting language. Python is designed to be highly readable. It uses English keywords frequently where as other languages use punctuation, and it has fewer syntactical constructions than other languages.

Now that you have gone through the basics of computer programming, let's dive deeper into Python and the the ways in which we can do things pythonically.

## Table of Contents

1. [Object Oriented Programming](#object-oriented-programming)
2. [Pythonics](#pythonics)
3. [Data Handling](#data-handling)
4. [Machine Learning](#machine-learning)
5. [Visualization](#visualization)
6. [Application Programming Interface](#application-programming-interface)
7. [Python Environments](#python-environments)
8. [Code Segmentation](#code-segmentation)
9. [Useful Plugins](#useful-plugins)
10. [Markdown](#markdown)


## Object Oriented Programming

Object-oriented programming (OOP) is a programming paradigm based on the concept of "objects", which can contain data, in the form of fields (often known as attributes or properties), and code, in the form of procedures (often known as methods). For example, a car is an object which has certain properties such as color, model, and certain methods such as drive, stop, and so on.

### Classes and Objects

A class is a blueprint for the object. We can think of class as a sketch (prototype) of a house. It contains all the details about the floors, doors, windows etc. Based on these descriptions we build the house. House is the object. When we create an object of a class, we are creating an instance of the class. This process is known as instantiation.

In [108]:
# Create a class 

class Dog:
    def __init__(self, name, age):
        self.name = name
        self.age = age
    def sit(self):
        print(f"{self.name} is now sitting.")
    def roll_over(self):
        print(f"{self.name} rolled over!")

my_dog = Dog('Willie', 6)
print(f"My dog's name is {my_dog.name}.")
print(f"My dog is {my_dog.age} years old.")

my_dog.sit()
my_dog.roll_over()

My dog's name is Willie.
My dog is 6 years old.
Willie is now sitting.
Willie rolled over!


### Important Concepts

1. Inheritance
    - Inheritance is a way of creating a new class for using details of an existing class without modifying it. The newly formed class is a derived class (or child class). Similarly, the existing class is a base class (or parent class). 
2. Encapsulation
    - Encapsulation is an another mechanism to restrict direct access to some of the object's components. Encapsulation is implemented by using access specifiers. An access specifier defines the scope and visibility of a class member.
3. Polymorphism
    - Polymorphism is an ability (in OOP) to use a common interface for multiple forms (data types). It allows objects of different classes to be treated as objects of a common superclass.
4. Abstraction
    - Abstraction is the concept of object-oriented programming that "shows" only essential attributes and "hides" unnecessary information. The main purpose of abstraction is hiding the unnecessary details from the users.

In [109]:
# Example of inheritance

class SARDog(Dog):
    def __init__(self, name, age):
        super().__init__(name, age)
    def search(self):
        print(f"{self.name} is now searching.")

my_dog = SARDog('Willie', 6)

my_dog.search()

# Example of encapsulation

class Car:
    def __init__(self):
        self.__updateSoftware()
    def drive(self):
        print('driving')
    def __updateSoftware(self):
        print('updating software')

redcar = Car()
redcar.drive()
#redcar.__updateSoftware()  # not accesible from object

# Example of polymorphism

class Parrot:
    def fly(self):
        print("Parrot can fly")
    def swim(self):
        print("Parrot can't swim")

class Penguin:
    def fly(self):
        print("Penguin can't fly")
    def swim(self):
        print("Penguin can swim")

def flying_test(bird):
    bird.fly()

blu = Parrot()
peggy = Penguin()

flying_test(blu)
flying_test(peggy)

Willie is now searching.
updating software
driving
Parrot can fly
Penguin can't fly


### Benefits of Object Oriented Programming

1. **Modularity for easier troubleshooting**
    - Something has gone wrong, and you have no idea where to look. Is the problem in the Widget, or is it the Wodget? If objects are well defined, you can more easily narrow down the problem to a particular object.
2. **Reuse of code through inheritance**
    - If an object already exists (perhaps written by someone else in your company), you can use that object in your program. And, if you have an object that almost does what you want but just needs a little extra functionality, you can create a new object based on the existing object.
3. **Flexibility through polymorphism**
    - If you have a bunch of objects that are all of the same class, but you want to iterate over them and call a particular method for each, polymorphism allows you to treat objects of different classes the same way if they contain a particular method.
4. **Interface Descriptions**
    - Objects can be described by the interfaces they support. If an object says it supports a particular interface, it means that the object has certain methods and properties. If an object says it supports a particular interface, it means that the object has certain methods and properties.

Object Oriented Programming is a very important concept in Python. It is the foundation of many libraries and frameworks in Python. In order to better understand the concept of Object Oriented Programming, you can refer to the following resources:

1. [Real Python](https://realpython.com/python3-object-oriented-programming/)
2. [Geeks for Geeks](https://www.geeksforgeeks.org/object-oriented-programming-in-python-set-1-class-and-its-members/)
3. [Programiz](https://www.programiz.com/python-programming/object-oriented-programming)  

## Pythonics

To clarify, Pythonics is a word I made up. But people often refer to best practices when writting code in Python as Pythonic. Python is a very versatile language. It is known for its simplicity and readability. There are certain ways in which we can write code in Python which are more efficient and readable. There are also certain attributes of Python which make it unique, which changes the way we write code in Python. 

### The Zen of Python

The Zen of Python is a collection of 19 software principles that influences the design of Python Programming Language. It is written by Tim Peters. The Zen of Python is a collection of aphorisms that capture the guiding principles of Python's design. 

In [110]:
import this

### Try and Except

In Python, we can use try and except blocks to handle exceptions. This is a more efficient way of handling exceptions than using if-else blocks. 

In [111]:
# Add all numbers in a list

mis_list = ['teh', 'smae', 'htis', 'wrod']
int_list = [1, 2, 3, 4, 5]
g_list = ['apple', 'banana', 'cherry', 'date', 'elderberry']
combo_list = mis_list + int_list + g_list

def cum_sum(list):
    sum = 0
    for i in list:
        try:
            sum += i
        except:
            continue
    return sum

print(cum_sum(combo_list))

15


### Value Swapping and Multiple Assignment

In Python, we can swap the values of two variables without using a temporary variable. We can also assign multiple values to multiple variables in a single line.

In [112]:
fruits = ['apple', 'banana', 'cherry']
f1, f2, f3 = fruits

print(f1)

apple


### Passing Multiple Arguments

In Python, we can pass multiple arguments to a function using the *args and **kwargs syntax. These will be unpacked into a tuple and a dictionary respectively. This also allows you to easily merge two dictionaries or lists. If you try to merge two lists by putting both variables in the list, it will create a list of lists, which is not what you want.

In [113]:
# Create a list of tuples

list1 = [(1, 2), (3, 4), (5, 6)]
list2 = [(1, 2), (3, 4), (5, 6)]
long_list = [*list1, *list2]
incorrect_list = [list1, list2]

print(long_list)
print(incorrect_list)

# Create a list of dictionaries

dict1 = {'a': 1, 'b': 2}
dict2 = {'c': 3, 'd': 4}
long_dict = {**dict1, **dict2}

print(long_dict)

[(1, 2), (3, 4), (5, 6), (1, 2), (3, 4), (5, 6)]
[[(1, 2), (3, 4), (5, 6)], [(1, 2), (3, 4), (5, 6)]]
{'a': 1, 'b': 2, 'c': 3, 'd': 4}


### Comprehension

List comprihension is a concise way to loop through all elements of a list and apply a function to them. The same effect can be achieved using a for loop, but list comprehension is more efficient and readable. 

Lambda functions are small anonymous functions. They can have any number of arguments but only one expression. They are used when you need a small function that you will only use once. While list comprehensions and lambda functions are very useful to clean up your code, they can also make your code less readable. So don't force their usage if it makes your code less readable.

In [114]:
# Example of list comprehension

list1 = [1, 2, 3, 4, 5]
list2 = [x**2 for x in list1]
result = []
for x in list1:
    result.append(x**2)

print(list2)
print(result) # Same result

# Example of dictionary comprehension

dict1 = {'a': 1, 'b': 2, 'c': 3, 'd': 4}
dict2 = {k:v**2 for (k, v) in dict1.items()}
print(dict2)

# Example of an anonymous function

double = lambda x: x * 2
print(double(5))

[1, 4, 9, 16, 25]
[1, 4, 9, 16, 25]
{'a': 1, 'b': 4, 'c': 9, 'd': 16}
10


### Underscores in Python

In Python, there is more than one way of using underscores. Each of these has a different meaning. 

1. Single Underscore
    - A single underscore is used to indicate a private variable, method or class. It is a convention and does not actually make the variable private.
2. Double Underscore
    - A double underscore prefix causes the Python interpreter to rewrite the attribute name in order to avoid naming conflicts in subclasses. This is also called name mangling.
3. Double Underscore and Double Underscore
    - If a name starts and ends with double underscores, it is considered a magic method. 

In [115]:
class House:
    def __init__(self, price):
        self.price = price

h1 = House(100000)
h2 = House(150000)

print(h1.price)

100000


### A Note on Indentation

In Python, indentation is not just a matter of style or readability, it's a matter of syntax. Python uses indentation to define blocks of code. For example, the code within a function, loop, if statement, or class must be indented.

Here is an example:

In [116]:
def greet(name):
    print(f"Hello, {name}!")

In this example, the `print` statement is indented to show that it's part of the `greet` function. If it wasn't indented, Python would raise a `IndentationError`.

Python doesn't require a specific number of spaces for indentation, but the number of spaces must be consistent throughout your code. The official Python style guide (PEP 8) recommends using 4 spaces for each level of indentation.

In many other programming languages, such as C++, Java, and JavaScript, indentation is used to improve readability, but it's not part of the syntax. These languages use braces `{}` to define blocks of code, and semicolons `;` to separate statements.

Here's an example in JavaScript:

```javascript
function greet(name) {
    console.log(`Hello, ${name}!`);
}
```

In this example, the `console.log` statement is indented to show that it's part of the `greet` function, but it would still work if it wasn't indented. The braces `{}` show where the function starts and ends.

Indentation in Python is a syntactic requirement and an integral part of the language. It enforces a clean and consistent coding style. In contrast, in many other languages, indentation is optional and used for readability. This difference makes Python unique and contributes to its reputation for readability and ease of learning.

## Data Handling

Data handling is an important aspect of programming. In Python, we can handle data in many ways. We can use lists, tuples, dictionaries, sets, and many other data structures to store and manipulate data. We can also use libraries such as NumPy, Pandas, and Matplotlib to handle data more efficiently.

### Built in Data Structures

1. Lists
    - A list is a collection which is ordered and changeable. In Python lists are written with square brackets.
2. Tuples
    - A tuple is a collection which is ordered and unchangeable. In Python tuples are written with round brackets.
3. Sets
    - A set is a collection which is unordered and unindexed. In Python sets are written with curly brackets.
4. Dictionaries
    - A dictionary is a collection which is unordered, changeable and indexed. In Python dictionaries are written with curly brackets, and they have keys and values.

In my personal experience, you're going to be using lists and dictionaries the most. Lists are used to store multiple items in a single variable. Dictionaries are used to store key-value pairs.

You would use a list when you have a collection of items that you want to keep in order. For example, you might use a list to store the names of all the students in a class. You would use a dictionary when you have a collection of items that you want to access by a unique key. For example, you might use a dictionary to store the email addresses of all the students in a class, with the student names as the keys.

You can nest lists and dictionaries inside each other to create more complex data structures. For example, you might use a list of dictionaries to store information about a collection of students, with each dictionary representing a student. You can similarly nest lists within dictionaries and lists within lists and dictionaries within dictionaries.

You may be tempted to always use dictionaries as they are more flexible than lists, but remember that lists are more efficient for storing ordered collections of items. As a rule of thumb, use lists when you need to keep items in order, and use dictionaries when you need to access items by a unique key.

One reason you may want to use a list over a dictionary is if you need to access items by their index. Lists are ordered collections, so you can access items by their index. Dictionaries are unordered collections, so you can't access items by their index.

One way you can use a dictionary to store items in order is to use a dictionary with integer keys. For example, you could use a dictionary with integer keys to store the names of all the students in a class, with the student names as the values and the student IDs as the keys. This way, you can access the student names by their student IDs. This way you can take advantage of the flexibility of dictionaries while still keeping items in order to enable indexing. Let's take a look at how this may be implemented:

In [117]:
# List of student names, ages, and gender
students = ['John', 'Mary', 'Mike', 'Jane']
ages = [15, 16, 17, 18]
gender = ['M', 'F', 'M', 'F']

# Create a dictionary of students
student_list = [{'name': student, 'age': age, 'gender': gen} for student, age, gen in zip(students, ages, gender)]
print(student_list)

# Order by name in ascending order
student_list.sort(key=lambda x: x['name'])

# Assign a student number based on the order of the list
for i, student in enumerate(student_list):
    student['student_number'] = i + 1

print(f"First student: {student_list[0]}")

# Print all male students who are above 16 years old
for student in student_list:
    if student['gender'] == 'M' and student['age'] > 16:
        print(f"Male student above 16: {student}")

# Slice the list to get the first two students
first_two_students = student_list[:2]
print(f"First two students: {first_two_students}")

[{'name': 'John', 'age': 15, 'gender': 'M'}, {'name': 'Mary', 'age': 16, 'gender': 'F'}, {'name': 'Mike', 'age': 17, 'gender': 'M'}, {'name': 'Jane', 'age': 18, 'gender': 'F'}]
First student: {'name': 'Jane', 'age': 18, 'gender': 'F', 'student_number': 1}
Male student above 16: {'name': 'Mike', 'age': 17, 'gender': 'M', 'student_number': 4}
First two students: [{'name': 'Jane', 'age': 18, 'gender': 'F', 'student_number': 1}, {'name': 'John', 'age': 15, 'gender': 'M', 'student_number': 2}]


### NumPy

NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. Like all data structures in Python, NumPy arrays are zero-indexed, meaning that the first element is at index 0, the second element is at index 1, and so on.

You can also combine NumPy arrays with other NumPy arrays, or with Python lists, tuples, or dictionaries. This can be useful if you need to combine data from different sources, or if you need to perform operations on multiple arrays at once.

Aside from when you need to perform matrix operations, you may also want to use NumPy arrays when you need to perform operations on large datasets. NumPy arrays are more efficient than Python lists for performing operations on large datasets, because NumPy arrays are implemented in C, which is a faster language than Python.

When trying to perform mathmatical operations on elements in a list, you would have to loop through each element in the list and perform the operation on each element. This can be slow for large datasets. NumPy arrays allow you to perform operations on all elements in the array at once, which is much faster than looping through each element.

In [118]:
# What happens when you try to apply a mathmatical operator directly to a list
list1 = [1, 2, 3, 4, 5]
list1 = list1 * 10
print(list1)

# Comparison between numpy and python lists
import numpy as np
import time

# Create a list of 10 million elements
list1 = list(range(10000000))
list2 = list(range(10000000))

# Create a numpy array of 10 million elements
array1 = np.array(list1)
array2 = np.array(list2)

# Multiply each element in the list by 2
start = time.time()
list1 = [x * 2 for x in list1]
end = time.time()
print(f"Time taken to multiply each element in a list by 2: {end - start}")

# Multiply each element in the numpy array by 2
start = time.time()
array1 = array1 * 2
end = time.time()
print(f"Time taken to multiply each element in a numpy array by 2: {end - start}")

[1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5]
Time taken to multiply each element in a list by 2: 0.25002074241638184
Time taken to multiply each element in a numpy array by 2: 0.016370058059692383


Time taken to multiply each element in a list by 2: 0.25002074241638184
Time taken to multiply each element in a numpy array by 2: 0.016370058059692383


At risk of turning this into a math camp, here's just a quick reminder matrix multiplication. When you multiply two matrices, the number of columns in the first matrix must be equal to the number of rows in the second matrix. The resulting matrix will have the same number of rows as the first matrix and the same number of columns as the second matrix. The elements of the resulting matrix are calculated by taking the dot product of the corresponding rows and columns of the two matrices. This is reffered to as the dot product.

There are two ways you can multiply matrices in NumPy. You can use the `np.dot()` function, or you can use the `@` operator. The `np.dot()` function is more flexible, as it allows you to multiply arrays of different shapes. The `@` operator is more concise, as it allows you to multiply arrays with a single character. 

If you want to multiply two matrices element-wise, you can use the `*` operator. This will multiply the corresponding elements of the two matrices. This is different from matrix multiplication, which multiplies the corresponding rows and columns of the two matrices. 

You can also apply exponentiation to a NumPy array using the `**` operator. This will raise each element of the array to the power of the exponent. This can also be done with the `np.power()` function.

In order to perform a cholesky decomposition on a matrix, you can use the `np.linalg.cholesky()` function. This will return the lower triangular matrix of the cholesky decomposition. If teh matrix is not positive definite, the function will raise a `LinAlgError`.

You can also apply a function to each element of a NumPy array using the `np.vectorize()` function. This will return a new array with the same shape as the original array, where each element is the result of applying the function to the corresponding element of the original array. This can be useful if you need to apply a function to each element of an array, but the function is not vectorized. This will be equivalent to using a for loop to apply the function to each element of the array.

In [119]:
array1 = np.array([1, 2, 3, 4, 5])
array2 = np.array([6, 7, 8, 9, 10])

# Dot product of two arrays
dot_product = np.dot(array1, array2)
print(dot_product)

# Element wise multiplication of two arrays
element_wise = array1 * array2
print(element_wise)

# Exponential of each element in the array
exponential = np.power(array1, 2)
print(exponential)

# Apply a function to each element in the array
def func(x):
    return x**2

result = np.vectorize(func)(array1)
print(result)

130
[ 6 14 24 36 50]
[ 1  4  9 16 25]
[ 1  4  9 16 25]


### Pandas

Pandas is a fast, powerful, flexible and easy to use open source data analysis and data manipulation library built on top of the Python programming language. It is used in a variety of fields, including data analysis, data visualization, and machine learning. Pandas is built on top of NumPy, so it is compatible with NumPy arrays.

Pandas provides two main data structures for working with data: Series and DataFrame. A Series is a one-dimensional array-like object that can hold any data type. A DataFrame is a two-dimensional tabular data structure that can hold multiple Series objects. You can think of a DataFrame as a table, where each row is a Series and each column is a column in the table.

You can create a Series object by passing a list, tuple, or dictionary to the Series constructor. You can create a DataFrame object by passing a dictionary of Series objects to the DataFrame constructor. You can also create a DataFrame object by passing a list of dictionaries to the DataFrame constructor. This will create a DataFrame where each dictionary is a row in the table.

You can access elements in a Series object by passing an index to the Series object. You can access elements in a DataFrame object by passing a column name to the DataFrame object. You can also access elements in a DataFrame object by passing a row index and a column name to the DataFrame object. You can also access elements in a DataFrame object by passing a row index to the DataFrame object. This will return a Series object containing the row at the specified index.

I won't be able to cover all the functionality of Pandas in this introduction, but I will cover some of the most important features. You can use the `head()` method to view the first few rows of a DataFrame. You can use the `tail()` method to view the last few rows of a DataFrame. You can use the `info()` method to view information about a DataFrame, including the data types of each column and the number of non-null values in each column. You can use the `describe()` method to view summary statistics about a DataFrame, including the mean, standard deviation, minimum, maximum, and quartiles of each column.

You can read up on the documentation for Pandas [here](https://pandas.pydata.org/docs/).

In [120]:
import pandas as pd
import random

# Generate some random normal data 
data = {'A': [random.normalvariate(0, 1) for _ in range(1000)], 'B': [random.normalvariate(0, 1) for _ in range(1000)]}

# Create a pandas dataframe
df = pd.DataFrame(data)

# Print the first 5 rows of the dataframe
df.head()

# Print the last 5 rows of the dataframe
df.tail()

df.info()
df.describe()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   A       1000 non-null   float64
 1   B       1000 non-null   float64
dtypes: float64(2)
memory usage: 15.8 KB


Unnamed: 0,A,B
count,1000.0,1000.0
mean,-0.016097,-0.065176
std,0.960789,0.991133
min,-3.23643,-3.370415
25%,-0.682249,-0.713249
50%,-0.011206,-0.084829
75%,0.627133,0.593196
max,3.052212,3.102873


Indexing and slicing in Pandas is a bit different from NumPy. In NumPy, you can use integer indices to access elements in an array. In Pandas, you can use column names to access elements in a DataFrame. You can also use row indices to access rows in a DataFrame. You can use the `loc[]` method to access rows by label, and the `iloc[]` method to access rows by integer index. You can use the `loc[]` method to access columns by label, and the `iloc[]` method to access columns by integer index.

Indexing is unavoidable in the dataframe I gave you, but in general you should avoid using integer indices to access elements in a DataFrame. This is because integer indices can be ambiguous, especially if the DataFrame has been sorted or filtered. It is better to use column names to access elements in a DataFrame, as this is more explicit and less error-prone.

But let's say you are given a dataframe with well defined columns and rows, you can access elements by either using the `loc[]` method or by directly accessing the element using the column name and row index. You can also apply functions to each element in a DataFrame using the `apply()` method. 

In [121]:
# Let's reuse the student list from earlier
student_df = pd.DataFrame(student_list)

# Set the index of the dataframe to the student number
student_df.set_index('student_number', inplace=True)
student_df.head()

# For students with an even student number, count the number of vowels in their name
def count_vowels(name):
    vowels = 'aeiou'
    return sum([1 for char in name if char.lower() in vowels])

student_df.loc[student_df.index % 2 == 0, 'vowel_count'] = student_df.loc[student_df.index % 2 == 0, 'name'].apply(count_vowels)
student_df.head()

# Group the students by number of vowels in their name
grouped = student_df.groupby('vowel_count')
grouped.groups

{1.0: [2], 2.0: [4]}

There may be times when you might want to join two DataFrames together. You can use the `merge()` method to join two DataFrames together. You can specify the columns to join on using the `on` parameter. You can also specify the type of join using the `how` parameter. You can use the `concat()` function to concatenate two DataFrames together. You can specify the axis to concatenate along using the `axis` parameter. You can also specify the type of join using the `join` parameter.

In [122]:
# Define two dataframes with a common column
df1 = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
df2 = pd.DataFrame({'A': [1, 2, 3], 'C': [7, 8, 9]})


# Merge the two dataframes on the common column
df3 = pd.merge(df1, df2, on='A')
df3

# Concatenate the two dataframes
df4 = pd.concat([df1, df2], axis=1)
df4

Unnamed: 0,A,B,A.1,C
0,1,4,1,7
1,2,5,2,8
2,3,6,3,9


Reshaping data is a common task in data analysis. You can use the `pivot()` method to pivot a DataFrame. You can specify the columns to pivot on using the `index` parameter. You can also specify the columns to pivot on using the `columns` parameter. You can use the `melt()` method to melt a DataFrame. You can specify the columns to melt using the `id_vars` parameter. You can also specify the columns to melt using the `value_vars` parameter.

In [123]:
# Create dataframe to reshape
data = {'A': [1, 2, 3, 4], 'B': [5, 6, 7, 8], 'C': [9, 10, 11, 12]}
df = pd.DataFrame(data)

# Melt the dataframe
df_melted = pd.melt(df, id_vars=['A'], value_vars=['B', 'C'], var_name='variable', value_name='value')
df_melted

# Pivot the dataframe
df_pivot = df_melted.pivot(index='A', columns='variable', values='value')
df_pivot

variable,B,C
A,Unnamed: 1_level_1,Unnamed: 2_level_1
1,5,9
2,6,10
3,7,11
4,8,12


How you deal with missing data is an important part of data analysis. You can use the `dropna()` method to drop rows with missing values. You can specify the axis to drop along using the `axis` parameter. You can also specify the how parameter to drop rows with missing values in all columns. You can use the `fillna()` method to fill missing values with a specified value. You can specify the value to fill missing values with using the `value` parameter. You can also specify the method to fill missing values with using the `method` parameter.

In general, you should avoid using the `dropna()` method to drop rows with missing values, as this can lead to data loss. It is better to use the `fillna()` method to fill missing values with a specified value, as this is more explicit and less error-prone.

In [124]:
# Create a dataframe with missing values
data = {'A': [1, 2, 3, 4], 'B': [5, 6, None, 8], 'C': [9, 10, 11, 12]}
df = pd.DataFrame(data)

# Fill missing values with the mean of the column
df.fillna(df.mean(), inplace=True)
df

# Drop rows with missing values
df.dropna(inplace=True)
df

# Create a dataframe with duplicate rows
data = {'A': [1, 2, 3, 4, 1], 'B': [5, 6, 7, 8, 5], 'C': [9, 10, 11, 12, 9]}
df = pd.DataFrame(data)

# Drop duplicate rows
df.drop_duplicates(inplace=True)
df

Unnamed: 0,A,B,C
0,1,5,9
1,2,6,10
2,3,7,11
3,4,8,12


### File I/O

In Python, you can read and write data to and from files using the built-in `open()` function. You can use the `open()` function to open a file in read mode, write mode, or append mode. You can also use the `open()` function to open a file in binary mode, text mode, or universal newline mode. You can use the `read()` method to read the entire contents of a file, the `readline()` method to read a single line from a file, and the `readlines()` method to read all lines from a file into a list.

You can use the `write()` method to write data to a file, the `writelines()` method to write a list of lines to a file, and the `flush()` method to flush the write buffer to the file. You can use the `close()` method to close a file. You can also use the `with` statement to open a file and automatically close it when you are done with it.

But I suspect most of you will mainly deal with CSV files. You can read and write CSV files in Python using the built-in `csv` module. You can use the `csv.reader()` function to read a CSV file, and the `csv.writer()` function to write a CSV file. You can also use the `DictReader` and `DictWriter` classes to read and write CSV files as dictionaries. You can even directly read a CSV file into a Pandas DataFrame using the `pd.read_csv()` function.

To output a DataFrame to a CSV file, you can use the `to_csv()` method. You can specify the file path, the delimiter, and other options using the `to_csv()` method. You can also output a DataFrame to an Excel file using the `to_excel()` method. You can specify the file path, the sheet name, and other options using the `to_excel()` method.

If most of you choose to stick with Visual Studio Code, you can use the built-in file explorer to navigate to the file you want to open and you can install extensions to better visualize the data in the file. 

In [125]:
import os
import random
import pandas as pd

filename = '../data/random.csv'

# Generate some data and write it to a csv file if it doesn't already exist
if not os.path.exists(filename):
    data = {'A': [random.normalvariate(0, 1) for _ in range(1000)], 'B': [random.normalvariate(0, 1) for _ in range(1000)]}
    df = pd.DataFrame(data)
    df.to_csv(filename)

# Read the data from the csv file
df = pd.read_csv(filename, index_col=0)
df.head()

Unnamed: 0,A,B
0,1.466562,-0.827942
1,-0.348798,1.965577
2,1.212117,0.768392
3,-1.030791,-1.32786
4,-0.119483,0.742211


### Statsmodels

Statsmodels is a Python module that provides classes and functions for the estimation of many different statistical models, as well as for conducting statistical tests, and statistical data exploration. An extensive list of result statistics are available for each estimator. The results are tested against existing statistical packages to ensure that they are correct.

Statsmodels is built on top of NumPy and Pandas, so it is compatible with NumPy arrays and Pandas DataFrames. Statsmodels provides classes and functions for estimating many different statistical models, including linear regression, logistic regression, and time series models. Statsmodels also provides classes and functions for conducting statistical tests, including hypothesis tests and goodness-of-fit tests.

You can use the `OLS()` function to estimate a linear regression model. You can use the `fit()` method to fit the model to the data. You can use the `summary()` method to view the results of the model. You can use the `predict()` method to make predictions with the model. You can use the `t_test()` method to conduct a t-test on the coefficients of the model. You can use the `f_test()` method to conduct an F-test on the coefficients of the model.

I will not be able to cover all the functionality of Statsmodels because that would require me to cover all the different statistical models that Statsmodels provides. But for the purposes for this demonstration, I will only cover linear regression. You can read up on the documentation for Statsmodels [here](https://www.statsmodels.org/stable/index.html).

In [126]:
from sklearn.datasets import load_diabetes
import pandas as pd
import statsmodels.api as sm

# Load the 'diabetes' dataset
diabetes = load_diabetes()

# Create a DataFrame with the feature data
df = pd.DataFrame(diabetes.data, columns=diabetes.feature_names)

# Add the target variable to the DataFrame
df['disease_progression'] = diabetes.target

# Print the first few rows of the DataFrame
df.head()

# Perform ols for age on disease progression
X = df['age']
y = df['disease_progression']

X = sm.add_constant(X)
model = sm.OLS(y, X).fit()
model.summary()


0,1,2,3
Dep. Variable:,disease_progression,R-squared:,0.035
Model:,OLS,Adj. R-squared:,0.033
Method:,Least Squares,F-statistic:,16.1
Date:,"Tue, 16 Jul 2024",Prob (F-statistic):,7.06e-05
Time:,01:38:04,Log-Likelihood:,-2539.2
No. Observations:,442,AIC:,5082.0
Df Residuals:,440,BIC:,5091.0
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,152.1335,3.606,42.192,0.000,145.047,159.220
age,304.1831,75.806,4.013,0.000,155.196,453.170

0,1,2,3
Omnibus:,52.996,Durbin-Watson:,1.921
Prob(Omnibus):,0.0,Jarque-Bera (JB):,26.909
Skew:,0.438,Prob(JB):,1.43e-06
Kurtosis:,2.167,Cond. No.,21.0


## Application Programming Interface (API)

An application programming interface (API) is a set of rules and protocols that allows one software application to interact with another. APIs are widely used in modern software development to enable communication between different applications, services, or components. In web development, APIs allow different web services to exchange data and perform actions on behalf of a user or another system.

In Python, we can use libraries such as **Flask** and **Django** to create APIs. We can also use libraries such as **requests** and **urllib** to interact with APIs provided by others. In fact, major Python packages like **statsmodels** and **scikit-learn** are built on top of robust APIs.

---

### Flask

**Flask** is a micro web framework written in Python. It is classified as a microframework because it does not require particular tools or libraries. Flask does not provide a database abstraction layer or form validation by default, instead relying on extensions to add these features as needed. This lightweight and modular design makes Flask easy to learn and use, especially for small to medium-sized web applications and APIs.

Flask provides classes and functions for:
- Creating web applications, including routes, views, and templates
- Handling HTTP requests and responses, such as request objects, response objects, and error handlers

For those interested in web development, Flask is a popular entry point for building web applications, APIs, and microservices. Flask is easy to learn and well-documented. You can read the documentation for Flask [here](https://flask.palletsprojects.com/en/2.0.x/).

If you’re not focusing on web development, you will likely only deal with objects returned by APIs, using Python libraries for data access.

---

### Working with APIs in Python

You can use the **requests** library to interact with APIs from Python. Some common HTTP request methods are:
- `get()`: Send a GET request to retrieve data from an API.
- `post()`: Send a POST request to submit data to an API.
- `put()`: Send a PUT request to update data in an API.
- `delete()`: Send a DELETE request to remove data from an API.
- `patch()`: Send a PATCH request to partially update data.

Example usage:


In [1]:
import requests

# Make a GET request to the JSONPlaceholder API
response = requests.get('https://jsonplaceholder.typicode.com/posts/1')

# Parse the JSON response
data = response.json()

# Print the data
print(data)

{'userId': 1, 'id': 1, 'title': 'sunt aut facere repellat provident occaecati excepturi optio reprehenderit', 'body': 'quia et suscipit\nsuscipit recusandae consequuntur expedita et cum\nreprehenderit molestiae ut ut quas totam\nnostrum rerum est autem sunt rem eveniet architecto'}


APIs are a great way to access data from the web. You can use APIs to access data from social media platforms, weather services, financial services, and many other sources. You can use APIs to access data in real-time, and you can use APIs to access data in bulk. You can use APIs to access data in a structured format, and you can use APIs to access data in a raw format.

Rather than having to manually download data from the web, you can use APIs to automate the process. This can save you time and effort, and it can also help you access data that is not available through other means. APIs are a powerful tool for data scientists, and they are an essential part of the data science toolkit.

When calling an API from a service, you will typically need to provide an API key. An API key is a unique identifier that is used to authenticate your requests to the API. You can obtain an API key by signing up for an account with the service that provides the API. You can then use the API key to authenticate your requests to the API.

Want to implement ChatGPT in your application? There's an API you can use for that. Fair warning though, it can get pretty expensive.

### JSON (JavaScript Object Notation)

**JSON (JavaScript Object Notation)** is a lightweight data-interchange format that is easy for humans to read and write, and easy for machines to parse and generate. JSON is language-independent and widely used as a standard for exchanging data between web services and applications.

In Python, you can use the built-in `json` module to work with JSON data:

- `json.loads()`: Parse JSON strings into Python objects.
- `json.dumps()`: Convert Python objects into JSON strings.
- `json.load()`: Read JSON data from a file.
- `json.dump()`: Write JSON data to a file.

**Example:**


In [2]:
import json

# Parse JSON string
obj = json.loads('{"name": "Alice", "age": 30}')

# Convert Python object to JSON string
json_str = json.dumps(obj)

## Python Environments

A Python environment is a context in which you run Python code and includes the Python interpreter, the standard library, and any other modules or packages that you have installed. There are many different Python environments that you can use, including the system Python environment, virtual environments, and containerized environments.

---

### Virtual Environments

A **virtual environment** is an isolated environment that contains its own Python interpreter, standard library, and any additional modules or packages you install. Virtual environments are useful for managing dependencies and isolating projects from each other.

You can create a virtual environment using the built-in `venv` module, the third-party `virtualenv` module, or the `conda` package manager.

- **On Unix-like systems (macOS, Linux):** Activate with the `source` command.
- **On Windows systems:** Activate with the `activate` command.
- **Deactivate on any system:** Use the `deactivate` command.

You can install packages in a virtual environment using the `pip` package manager and list installed packages with `pip list`.

---

### Managing Environments with Conda

You can also use the **conda** package manager (Anaconda or Miniconda) to create and manage environments.

- Create an environment: `conda create --name myenv`
- Activate the environment: `conda activate myenv`
- Deactivate: `conda deactivate`
- Install packages: `conda install package_name`
- List packages: `conda list`

If you encounter an error installing a package, it may not be available in the current channel. You can:
- Try using a different package index or installing from source
- Add channels with `conda config --add channels channel_name`
- View channels: `conda config --show channels`
- Remove channels: `conda config --remove channels channel_name`

---

### Python Interpreter in Visual Studio Code

To easily switch between Python interpreters in VS Code:
- Use the **`Python: Select Interpreter`** command to select the interpreter for your project.
- Use **`Python: Create Terminal`** to open a new terminal with the selected interpreter.

---

### Example: Creating and Managing a Conda Environment

```bash
# Create a virtual environment
conda create --name myenv
conda activate myenv

# Install packages
conda install numpy pandas matplotlib

# List packages
conda list

# Deactivate the virtual environment
conda deactivate
```


### Package Requirements 

When sharing your code with others, you may want to include a `requirements.txt` file that lists all the packages that are required to run your code. You can create a `requirements.txt` file by running the `pip freeze` command. You can install the packages listed in a `requirements.txt` file by running the `pip install -r requirements.txt` command.

```bash
# Create a requirements.txt file
pip freeze > requirements.txt

# Install packages from a requirements.txt file
pip install -r requirements.txt
```


### Docker

Docker is a platform for developing, shipping, and running applications in containers. A **container** is a lightweight, standalone, executable package of software that includes everything needed to run an application: code, runtime, system tools, system libraries, and settings. Containers isolate software from their environment, ensuring that applications work uniformly regardless of where they are deployed.

You can create a Docker container using a **Dockerfile**, which is a text file containing the instructions for building the container image.  
- Build a container image with the `docker build` command.
- Run a container using the `docker run` command.
- List running containers with `docker ps`.
- Stop a running container using the `docker stop` command.

For more complex applications involving multiple containers, you can use the **docker-compose** tool.  
- Define a multi-container application with a `docker-compose.yml` file (YAML format).
- Start all defined containers with `docker-compose up`.
- Stop all containers with `docker-compose down`.

**Common Docker Commands:**
```bash
# Build a Docker container image
docker build -t mycontainer .

# Run a Docker container
docker run mycontainer

# List running Docker containers
docker ps

# Stop a running Docker container
docker stop mycontainer


### Common Unix Commands

Here are some common Unix commands that you may find useful when working with Python environments:

1. `ls`  
    List the files and directories in the current directory.

2. `cd`  
    Change the current directory.

3. `pwd`  
    Print the current working directory.

4. `mkdir`  
    Create a new directory.

5. `rm`  
    Remove a file or directory.

For a more comprehensive list of Unix commands, you can refer to the following resources:

- [Unix Tutorial](https://www.tutorialspoint.com/unix/index.htm)
- [Unix Commands](https://www.geeksforgeeks.org/unix-commands/)
- [Unix Commands Cheat Sheet](https://www.linuxtrainingacademy.com/linux-commands-cheat-sheet/)
- [Unix Commands Reference](https://www.lifewire.com/unix-commands-4094200)


## Code Segmentation

In Python, you can use code segmentation to break up your code into smaller, more manageable pieces. This makes your code easier to read, understand, and maintain. In a `.py` or `.ipynb` file, you can import functions from other files using the `import` statement or the `from ... import ...` statement. You can also use the `as` keyword to give an imported module or function an alias. To import all functions from a module, you can use the `*` wildcard.

Code segmentation is especially useful when working with a large codebase, as it allows you to organize your code into different files (modules). You can create a separate file for each module, and then import only the functions or classes that you need. This approach makes your code more modular and reusable, and also simplifies testing and debugging.

A typical workflow is to write and test code in a Jupyter notebook, then copy the finalized code into a Python file, and run that Python file in the terminal. This process allows you to iteratively develop and test code in the notebook environment, and then verify its correctness when run as a standalone script.

**Note:** If you're importing a function from a subfolder (package), make sure to include an `__init__.py` file in that subfolder. This file can be empty, but it is required to tell Python that the subfolder is a package and enables imports from that directory.

**Example:**
```python
# Import an entire module
import mymodule

# Import a specific function
from mymodule import myfunction

# Import with an alias
import mymodule as mm

# Import all functions (not recommended for large modules)
from mymodule import *


You would also usually include a `requirements.txt` file in the root directory of your project. This file contains a list of all the packages that your project depends on. You can use the `pip install -r requirements.txt` command to install all the packages listed in the `requirements.txt` file.

Be sure to keep this file up to date as you add or remove packages from your project. This can help you keep track of the packages that your project depends on, and it can also help you ensure that your project runs on different systems. You can use the `pip freeze` command to generate a `requirements.txt` file from the packages that are installed in your environment.

## Useful Plugins

I mainly use Visual Studio Code for my Python development, so here are some useful plugins that can enhance your Python development experience:

1. **Python**
    - The official Python extension for Visual Studio Code. It provides features such as IntelliSense (code completion), linting, debugging, code navigation, code formatting, and unit testing.

2. **Jupyter**
    - An extension for working with Jupyter notebooks directly in Visual Studio Code. It supports interactive notebooks with features like IntelliSense, code formatting, debugging, and visualization.

3. **Rainbow Indent**
    - This extension colorizes code indentation levels, making it easier to see code structure and improving readability—especially in deeply nested code.

4. **Data Wrangler**
    - A powerful extension for working with data. It supports data visualization, data exploration, data cleaning, and transformation—all within Visual Studio Code.

---

You can install these plugins by searching for them in the Visual Studio Code Marketplace.  
Alternatively, use the terminal with the `code` command:

```bash
# Install by name from the Marketplace
code --install-extension ms-python.python
code --install-extension ms-toolsai.jupyter
code --install-extension oderwat.indent-rainbow
code --install-extension microsoft.data-wrangler

# Install from a .vsix file
code --install-extension path/to/extension.vsix


## Markdown

Markdown is a lightweight markup language with plain-text-formatting syntax. Its design allows it to be converted to many output formats, but the original tool by the same name only supports HTML. Markdown is often used to format README files, write messages in online discussion forums, and create rich text using a plain text editor.

You can use Markdown to format text, add images, create lists, and add links. Markdown supports creating headings, paragraphs, blockquotes, code blocks, tables, lists, and task lists. It also allows for footnotes, definition lists, and abbreviations (with some implementations).

Markdown is built directly into Jupyter notebooks, so you can use it to format your text, create headings, paragraphs, lists, and links, and also to add tables, images, and code blocks—all within your notebook cells.

You can also add LaTeX equations to your Jupyter notebooks using Markdown. LaTeX can be used to create mathematical equations, symbols, fractions, integrals, matrices, Greek letters, arrows, and other advanced mathematical expressions.

**Here's an example:**

```markdown
# Heading 1
## Heading 2
### Heading 3

- List item

1. Numbered item

[Link](https://www.example.com)

![Heatmap](graphs/heatmap.png)

<br>
<br>

$$
\int_{a}^{b} x^2 dx
$$
