# Intermediate Python

Now that we understand the basics of programming in Python, let's learn about its applications to FinTech.
This workshop is inspired by on an original workshop developed by Jacob Kulik and David Pogrebitskiy.

# What you'll be able to do after this workshop

- Understand intermediate features of Python and their applications to FinTech
  - Using pandas to manipulate data
  - Using the Yahoo Finance API to get stock data
  - Plotting data with matplotlib


# Quick Review


In [None]:
# Variables, Data Types, and Data Structures

x = 7

name = "Jane"

ex_list = ["Disrupt", "Fintech", "Initiative"]

ex_dict = {"AAPL": 125.07, "MSFT": 239.58, "META": 124.74}


In [None]:
# indexing

print(ex_list[0])

print(ex_dict["MSFT"])


In [None]:
# looping
for string in ex_list:
    print(string)

for key_value_pair in ex_dict:
    print(f'{key_value_pair} is trading at {ex_dict[key_value_pair]}')


In [None]:
# functions


def hello_world():
    print("Hello World")


def say_hello(name):
    print(f'Hello {name}')


def ranker(list_of_strings):
    for i, string in enumerate(list_of_strings):
        print(f'{i+1}: {string}')


hello_world()
say_hello("John")
ranker(["John", "Jane", "Aayan", "Sofia", "Mark"])


# pandas

pandas is a Python library for data analysis. It provides data structures and operations for manipulating numerical tables and time series data. It is a fundamental high-level building block for doing practical, real world data analysis in Python.

### [Documentation](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html)

In [None]:
# we alias pandas as pd to make it easier to call. think of it like a nickname
import pandas as pd

In [None]:
# motivations for using pandas instead of a dictionary

# we could store the data in a dictionary, but it would be difficult to work with
stock_market_data = {
    'AAPL': {
        'date': ['2023-02-14', '2023-02-15', '2023-02-16', '2023-02-17'],
        'adjclose': [153.199997, 155.330002, 153.710007, 152.550003],
        'close': [153.199997, 155.330002, 153.710007, 152.550003],
        'high': [153.770004, 155.5, 156.330002, 153.0],
        'low': [150.860001, 152.880005, 153.350006, 150.850006],
        'open': [152.119995, 153.110001, 153.509995, 152.350006],
        'volume': [61707600, 65669300, 68167900, 59095900]
    },
    'GOOG': {
        'date': ['2023-02-14', '2023-02-15', '2023-02-16', '2023-02-17'],
        'adjclose': [94.949997, 97.099998, 95.779999, 94.589996],
        'close': [94.949997, 97.099998, 95.779999, 94.589996],
        'high': [95.175003, 97.339996, 97.879997, 95.75],
        'low': [92.650002, 94.360001, 94.970001, 93.449997],
        'open': [94.660004, 94.739998, 95.540001, 95.070000],
        'volume': [42513100, 37029900, 35642100, 31074100]
    },
    'MSFT': {
        'date': ['2023-02-14', '2023-02-15', '2023-02-16', '2023-02-17'],
        'adjclose': [249.800003, 252.949997, 252.789993, 253.600006],
        'close': [249.800003, 252.949997, 252.789993, 253.600006],
        'high': [251.399994, 254.899994, 253.729996, 255.970001],
        'low': [248.210007, 250.970001, 251.270004, 253.0],
        'open': [248.210007, 251.0, 252.539993, 253.5],
        'volume': [20331600, 26144900, 15917000, 16853700]
    }
}



In [None]:
# lets convert this dictionary into a pandas dataframe

# convert this dictionary into a pandas dataframe
stock_market_df = pd.concat({ticker: pd.DataFrame.from_dict(data)
                             for ticker, data in stock_market_data.items()}, 
                            axis=0, names=['ticker']).reset_index(level=1, drop=True).reset_index()

stock_market_df

In [None]:
# lets get all the tickers in our dataframe
stock_market_df['ticker']

In [None]:
# there's a lot of duplicate values, so we can use the unique() method to get a list of unique values
stock_market_df['ticker'].unique()

# how many unique tickers are in our dataframe?
len(stock_market_df['ticker'].unique())

In [None]:
# lets get the data in the 10th row of our dataframe
stock_market_df.loc[10]

In [None]:
# lets get the data for the MSFT ticker
msft_data = stock_market_df[stock_market_df['ticker'] == 'MSFT']
msft_data


In [None]:
# lets get the data for the MSFT and the GOOG tickers
msft_and_goog_data = stock_market_df[stock_market_df['ticker'].isin(['MSFT', 'GOOG'])]
msft_and_goog_data


In [None]:
# what are the data types of each column in our dataframe?
stock_market_df.dtypes

In [None]:
# lets convert the date column to a datetime object. this will allow us to do some cool things with the data
stock_market_df['date'] = pd.to_datetime(stock_market_df['date'])
stock_market_df.dtypes

In [None]:
# which rows have highs greater than 200?
greater_than_200 = stock_market_df[stock_market_df['high'] > 200]
greater_than_200


In [None]:
# which rows closed lower than they opened?
closed_lower_than_open = stock_market_df[stock_market_df['close'] < stock_market_df['open']]
closed_lower_than_open

In [None]:
# try to get the data for the AAPL ticker on 2023-02-14 (hint: you'll need to use logical and (represented by the & operator))


In [None]:
# try to get the data for the AAPL ticker on 2023-02-14 and the GOOG ticker on 2023-02-15 (hint: you'll need to use logical or (represented by the | operator))

In [None]:
# try writing your own conditional!

# Yahoo Finance API

### [Documentation](https://pypi.org/project/yfinance/)

### Motivation for using the API
We have this static dataset of stock data, but what if we want to get the most recent data? 
Should I have to go to a website and manually input the data? This manual entry is prone to errors and can become tedious when there are hundreds of tickers with hundreds of data points for each ticker.
Instead of manually entering this data, we can use we can use the Yahoo Finance API!

In [None]:
# since yfinance is an external library, we need to install it first
!pip install yfinance

In [None]:
import yfinance as yf

In [None]:
# lets make the same dataframe using yfinance
tickers = ["AAPL", "MSFT", "GOOG"]
start_date = "2023-02-14"
end_date = "2023-02-18"

yf_df = yf.download(tickers, start=start_date, end=end_date, group_by='Ticker')

yf_df

In [None]:
# the format isn't the same. lets format this dataframe to look like our stock_market_df so it's easier to work with
yf_df = yf_df.stack(level=0).rename_axis(['Date', 'Ticker']).reset_index(level=1)

yf_df

# Putting it all Together: Visualizations with matplotlib

### [Documentation](https://matplotlib.org/stable/api/index)

Now that we've retrieved our data, let's visualize it with matplotlib, a library for plotting data!



In [None]:
import matplotlib.pyplot as plt

In [None]:
# lets get the data for the MSFT ticker and store it in a variable
msft = yf_df[yf_df['Ticker'] == 'MSFT']
msft

In [None]:
# let's then extract the date and the close price from the dataframe and store it in a variable

# since date is the index of our dataframe, we can use the index to get the date
date = msft.index
close = msft['Close']

In [None]:
# let's plot our data!

# styling
fig, ax = plt.subplots()

ax.plot(date, close)

ax.set_xlabel('Date')

ax.set_ylabel('Close Price')

ax.set_title('Microsoft (MSFT) Close Prices')

ax.set_xticks(date)

In [None]:
# lets plot the difference between the high and low prices of GOOG for each day
goog = yf_df[yf_df['Ticker'] == 'GOOG']

date = goog.index
high = goog['Close']
low = goog['Low']

fig, ax = plt.subplots()

ax.fill_between(date, high, low, color='blue', alpha=0.3)

ax.set_xlabel('Date')

ax.set_ylabel('Price')

ax.set_title('Google (GOOG) High vs Low Prices')

ax.set_xticks(date)


In [None]:
# what if your boss asked you to plot the difference between the high and low prices for all the stocks in our dataframe?
# we would have to write a lot of code to do this. let's write a function to do this for us

def plot_high_vs_low(ticker):
    stock_data = yf_df[yf_df['Ticker'] == ticker]

    date = stock_data.index
    close = stock_data['Close']
    high = stock_data['High']
    low = stock_data['Low']

    fig, ax = plt.subplots()

    ax.plot(date, close)

    ax.fill_between(date, high, low, color='blue', alpha=0.3)

    ax.set_xlabel('Date')

    ax.set_ylabel('Price')

    ax.set_title(f'{ticker} Close Prices')

    ax.set_xticks(date)

In [None]:
# your turn! write a function that plots the volume of a stock over time

In [None]:
# write a function that plots the price ratio (closing price/opening price) of a stock over time. 

In [None]:
# challenge! investigate another ratio that you think might be interesting to plot.

In [None]:
# challenge! write a function that plots the difference between the close and adjusted close prices of a stock over time