1. Quarter-long Project

* Learning Objectives:
    + Filtering and displaying financial data
    + Using your newly acquired Python skills on an actual dataset
    + Working example: https://jhsivadas-stock-filter-filter-al7bq7.streamlit.app/
    
* Implementation: 
    + We wil provide you a CSV file with stock data
    + Create a stock selection filter by financial data
    + Build it using Python and Streamlit  

* Incentives:
    + Top 2 projects will serve as model for other PNG chapters
    + And will be implemented with real-world data (we now have a PNG fund)

2. Intro to Numpy

Numpy is a library that we use when working with arrays. It stands for Numerical Python and allows us to perform
matrix transformations, filter arrays, and process large amounts of data in a seemingless way. This library has
many applications in data science, machine learning, linear algebra, finance, and statistics. </br>

Installation:
```
pip install numpy
```

Learn more at: https://www.w3schools.com/python/numpy/numpy_intro.asp </br>
Examples: https://www.geeksforgeeks.org/python-numpy/ </br>

In [None]:
# Importing numPy
import numpy as np

# Creating an array from a list
array1 = np.array([3, 8, 10, 12])
print("Numpy array with rank 1 \n", array1)
array2 = np.array([[10, 1],
                [6, 7, 99]])
print("Numpy array with rank 2 \n", array2)

# Creating an array from a tuple
t_array = np.array((3, 7, 99, 10))
print("Array created using a tuple\n", t_array)

* Accessing an array index

In [None]:
# Initial Array
arr = np.array([[-1, 2, 0, 4],
                [4, -0.5, 6, 0],
                [2.6, 0, 7, 8],
                [3, -10, 9, 2.0]])


# Printing a range of Array
# with the use of slicing method
sliced_arr = t_array[:2, ::3]
print ("Array with first 2 rows and"
    " alternate columns(0 and 3):\n", sliced_arr)
 
# Printing elements at
# specific Indices
Index_arr = arr[[1, 0, 2, 3], 
                [3, 4, 1, 0]]
print ("\nElements at indices (1, 3), "
    "(0, 4), (2, 1), (3, 0):\n", Index_arr)

* Basic Array Operations

In [None]:
# Defining Array 1
a = np.array([[10, 7, 2],
              [4, 16, 11]])
 
# Defining Array 2
b = np.array([[4, 3, 1],
              [2, 1, 5.5]])
               
# Adding 1 to every element
print ("Adding 1 to every element:", a + 1)
 
# Subtracting 2 from each element
print ("\nSubtracting 2 from each element:", b - 2)
 
# sum of array elements
# Performing Unary operations
print ("\nSum of all array "
       "elements: ", a.sum())
 
# Adding two arrays
# Performing Binary operations
print ("\nArray sum:\n", a + b)

* Data types of a numPy array
    + numPy arrays are usually comprised of numbers, but the data types of its elements can be anything.
    + The values of a numPy array are stored in what can be conceived as a contiguous block of memory bytes
    interpreted by the dtupe object. 
    + When we create a numPy array, numPy will try to guess a datatype, but functions that construct arrays can
    also include an argument to explicitly specify the datatype.

In [None]:
# Integer datatype guessed by Numpy
x = np.array([1, 6, 10])  
print("Integer Datatype: ")
print(x.dtype)         
 
# Float datatype guessed by Numpy
x = np.array([1.0, 2.0, 2.7]) 
print("\nFloat Datatype: ")
print(x.dtype)  
 
# Forcing a datatype
x = np.array([1, 2, 4.6], dtype = np.int64)   
print("\nForcing a Datatype: ")
print(x, x.dtype)

* Mathematical Operations on a numPy array
    + Operations are performed element-wise
    + Most used functions are sum and transpose. 
    + Transpose is used very often when performing matrix multiplications in 
    machine learning models.

In [None]:

# First Array
arr1 = np.array([[4, 7], [2, 6], [5, 19]], 
                 dtype = np.float64)
                  
# Second Array
arr2 = np.array([[3, 6], [2, 8], [1, 8]], 
                 dtype = np.float64) 
 
# Addition of two Arrays
a_sum = np.add(arr1, arr2)
print("Adding two Arrays: ")
print(a_sum)
 
# Addition of all Array elements
# using the sum method
a_sum1 = np.sum(arr1)
print("\nAddition of Array elements: ")
print(a_sum1)
 
# Square root of an array
a_sqrt = np.sqrt(arr1)
print("\nSquare root of array1's elements: ")
print(a_sqrt)
 
# Transpose of Array using the in-built function 'T'
arr_t = arr1.T
print("\nTranspose of Array 1: ")
print(arr_t)

Other numPy array methods can be fund on the table available here: https://www.geeksforgeeks.org/python-numpy/


3. Intro to Pandas

* Why Pandas?
    + Easy to manipulate data in a table-like manner
    + Suitable for very large datasets
    + Easy handling of missing data
    + Easy time series manipulation
    + Easy to merge and filter different tables
    
* Installation
```
    pip install pandas
```

* Series Object
    + One dimensional
    + Can hold data of any type
    + Supports both integer and label-based indexing

In [None]:
import pandas as pd
import numpy as np
 
# Creating empty series
ser = pd.Series()

# Printing the empty series
print(ser)
 
# Creating an array
data = np.array([1, 2, 'j', 'l'])

# Converting the array to a series
ser = pd.Series(data)
print(ser)

* Pandas Dataframe

Two-dimensional tabular data structure of potentially heterogeneous data types and varying size.
Comprised of data organized into rows and columns. In practice, we often create tables by loading
data contained in a CSV or Excel file. 

In [None]:
import pandas as pd
   
# Calling DataFrame constructor
df = pd.DataFrame()
print(df)
 
# list of strings
lst = [['Welcome', 'to', 'PNG'], ["Class", "of", 2026]]
   
# Calling DataFrame constructor on list
df = pd.DataFrame(lst)
print(df)

* Practical Examples
    + Available at: https://www.digitalocean.com/community/tutorials/python-pandas-module-tutorial)

In [None]:
# Creating a dataframe using a hashmap / dictionary

import pandas as pd
import numpy as np
df = pd.DataFrame({
    "State": ['Andhra Pradesh', 'Maharashtra', 'Karnataka', 'Kerala', 'Tamil Nadu'],
    "Capital": ['Hyderabad', 'Mumbai', 'Bengaluru', 'Trivandrum', 'Chennai'],
    "Literacy %": [89, 77, 82, 97,85],
    "Avg High Temp(c)": [33, 30, 29, 31, 32 ]
})
print(df)

In [None]:
# Creating a dataframe from a CSV file (in this case, the file "cities.csv must be
# in your working directory")

import pandas as pd
data =  pd.read_csv('cities.csv')
print(data)

In [None]:
# Displaying the first two rows of the dataframe
print(df.head(2))

# Displaying the last two rows of the dataframe
print(df.tail(1))

# Displaying the summary of the "Literacy" column
print(df['Literacy %'].describe())

# Sorting literacy values
print(df.sort_values('Literacy %', ascending=False))

* Slicing / filtering rows and columns

In [None]:
# Slicing / extracting data of a column
print(df['Capital'])
# OR use an object notation
print(df.Capital)

# Filtering data
print(df[df['Literacy %']>90])
print(df[df['State'].isin(['Karnataka', 'Tamil Nadu'])])

* Renaming columns

In [None]:
# Rename the "Literacy %" column to "Literacy percentage"
df.rename(columns = {'Literacy %':'Literacy percentage'}, inplace=True)
print(df.head())

* Manipulating Dataframes

In [None]:
import pandas as pd

# First dataframe
d1 = {  
    'Employee_id': ['1', '2', '3', '4', '5'],
    'Employee_name': ['Akshar', 'Jones', 'Kate', 'Mike', 'Tina']
}
df1 = pd.DataFrame(d1, columns=['Employee_id', 'Employee_name'])  
print(df1)

# Second dataframe
d2 = {  
    'Employee_id': ['4', '5', '6', '7', '8'],
    'Employee_name': ['Meera', 'Tia', 'Varsha', 'Williams', 'Ziva']
}
df2 = pd.DataFrame(d2, columns=['Employee_id', 'Employee_name'])  
print(df2)

* Merging dataframes

In [None]:
print(pd.merge(df1, df2, on='Employee_id'))
print(df2)

* Grouping data

In [None]:
group = df2.groupby('Employee_name')
print(group.get_group('Meera'))

* Concatenating

In [None]:
print(pd.concat([df1, df2]))

* Pandas applications
    + Since Pandas is built on top of numPy, both libraries share common applications in data science
    and machine learning. 
    + Futhermore, because pandas allows data to be organized in a tabular fashion,
    it is a very useful tool for displaying financial information.
    + Very good integration with Jupyter Notebook

Pandas functions: https://pandas.pydata.org/docs/reference/general_functions.html </br>
References: https://www.geeksforgeeks.org/introduction-to-pandas-in-python/

4. Displaying financial data with plotly and streamlit

* We will use plotly and streamlit for the project
* plotly is a very beginner-friendly way of plotting in Python, and it has many
applications in finance, such as the stock filter that we will build for this 
project.
* Streamlit allows us to seeminglessly publish plotly graphs online
* Good resources:
    + https://www.geeksforgeeks.org/python-plotly-tutorial/
    + Used in this lecture: https://towardsdatascience.com/a-multi-page-interactive-dashboard-with-streamlit-and-plotly-c3182443871a

First, we must install streamlit and plotly:
```
pip3 install streamlit
pip install plotly
```

To run a streamlit app, we will type:

```
streamlit run myapp.py
```

In the Python file that streamlit will run, we can enter the following code:

In [None]:
import streamlit as st
import pandas as pd
import plotly.express as px

# In this example, we will use the Gapminder data included in plotly.
df = pd.DataFrame(px.data.gapminder())

# Check out the data
print(df)

# Get unique country names
clist = df['country'].unique()

# Country dropdown
country = st.sidebar.selectbox("Select a country:",clist)

# Plot header
st.header("GDP per Capita over time")

# Creating a line with plotly
fig = px.line(df[df['country'] == country], 
    x = "year", y = "gdpPercap", title = country)

# Plotting
st.plotly_chart(fig)

* Pagination (optional, but makes program cleaner)

In [None]:
# This is the logic:

page = st.sidebar.selectbox('Select page',
  ['Country data','Continent data'])
if page == 'Country data':
  # Display the country content here
  pass
else:
  # Display the continent content here
  pass

In [None]:
# Actual implementation

# Importing packages
import streamlit as st
import pandas as pd
import plotly.express as px

# Page configurations
st.set_page_config(layout = "wide")
df = pd.DataFrame(px.data.gapminder())
st.header("National Statistics")
page = st.sidebar.selectbox('Select page',
  ['Country data','Continent data'])

# Selecting for country or continent
## Countries
if page == 'Country data':
  clist = df['country'].unique()
  country = st.selectbox("Select a country:",clist)
  col1, col2 = st.columns(2)
  fig = px.line(df[df['country'] == country], 
    x = "year", y = "gdpPercap",title = "GDP per Capita")
 
  col1.plotly_chart(fig,use_container_width = True)
  fig = px.line(df[df['country'] == country], 
    x = "year", y = "pop",title = "Population Growth")
  
  col2.plotly_chart(fig,use_container_width = True)

## Continents
else:
  contlist = df['continent'].unique()
 
  continent = st.selectbox("Select a continent:",contlist)
  col1,col2 = st.columns(2)
  fig = px.line(df[df['continent'] == continent], 
    x = "year", y = "gdpPercap",
    title = "GDP per Capita",color = 'country')
  
  col1.plotly_chart(fig)
  fig = px.line(df[df['continent'] == continent], 
    x = "year", y = "pop",
    title = "Population",color = 'country')
  
  col2.plotly_chart(fig, use_container_width = True)

# Notice the amount of repeated code.. how can we optimize?

5. Displaying financial data with matplotlib (one example)

* Although we won't use matplotlib for the project, it is the most popular
library for graphing data on Python, so I have included an example here, taken
from matplotlib's official website.

In [None]:
# Example available at: 
# https://matplotlib.org/1.5.3/examples/pylab_examples/finance_demo.html

import matplotlib.pyplot as plt
from matplotlib.dates import DateFormatter, WeekdayLocator,\
    DayLocator, MONDAY
from matplotlib.finance import quotes_historical_yahoo_ohlc, candlestick_ohlc


# (Year, month, day) tuples suffice as args for quotes_historical_yahoo
date1 = (2004, 2, 1)
date2 = (2004, 4, 12)


mondays = WeekdayLocator(MONDAY)        # major ticks on the mondays
alldays = DayLocator()              # minor ticks on the days
weekFormatter = DateFormatter('%b %d')  # e.g., Jan 12
dayFormatter = DateFormatter('%d')      # e.g., 12

quotes = quotes_historical_yahoo_ohlc('INTC', date1, date2)
if len(quotes) == 0:
    raise SystemExit

fig, ax = plt.subplots()
fig.subplots_adjust(bottom=0.2)
ax.xaxis.set_major_locator(mondays)
ax.xaxis.set_minor_locator(alldays)
ax.xaxis.set_major_formatter(weekFormatter)
#ax.xaxis.set_minor_formatter(dayFormatter)

#plot_day_summary(ax, quotes, ticksize=3)
candlestick_ohlc(ax, quotes, width=0.6)

ax.xaxis_date()
ax.autoscale_view()
plt.setp(plt.gca().get_xticklabels(), rotation=45, horizontalalignment='right')
