<a href="https://colab.research.google.com/github/FJada/artificial-intelligence/blob/main/Bike_Rides_and_the_Poisson_Model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Bike Rides and the Poisson Model

To help the urban planners, you are called to model the daily bike rides in NYC using [this dataset](https://gist.github.com/sachinsdate/c17931a3f000492c1c42cf78bf4ce9fe/archive/7a5131d3f02575668b3c7e8c146b6a285acd2cd7.zip).  The dataset contains date, day of the week, high and low temp, precipitation and bike ride couunts as columns. 



## Maximum Likelihood I 
 
The obvious choice in distributions is the [Poisson distribution](https://en.wikipedia.org/wiki/Poisson_distribution) which depends only on one parameter, λ, which is the average number of occurrences per interval. We want to estimate this parameter using Maximum Likelihood Estimation.

Implement a Gradient Descent algorithm from scratch that will estimate the Poisson distribution according to the Maximum Likelihood criterion. Plot the estimated mean vs iterations to showcase convergence towards the true mean. 

References: 

1. [This blog post](https://towardsdatascience.com/the-poisson-process-everything-you-need-to-know-322aa0ab9e9a). 

2. [This blog post](https://towardsdatascience.com/understanding-maximum-likelihood-estimation-fa495a03017a) and note the negative  log likelihood function.  



Resources used:

https://towardsdatascience.com/understanding-maximum-likelihood-estimation-fa495a03017a

https://www.statology.org/mle-poisson-distribution/

https://towardsdatascience.com/the-poisson-process-everything-you-need-to-know-322aa0ab9e9a

https://mlstory.org/optimization.html

https://medium.com/computronium/gradient-based-optimizations-under-the-deep-learning-lens-ac99e62289a8

In [None]:
import numpy as np
import pandas as pd
import math
from math import lambda 

#get data
df = pd.read_csv("nyc_bb_bicyclist_counts.csv")
bicyclist = df["bicyclist"]

def poisson_distribution(x):
  return ((lambda**x * np.exp(-lambda)) / (math.factorial(x)))

def negative_log_likelihood(bicyclist, parameters):
  n = len(bicyclists) 
  lambda = parameters
  return (n * lambda  + np.sum(np.log(math.factorial(x))) - np.log(lambda) * np.sum(math.factorial(x)))


def gradient(bicyclist, parameters):
  lambda = parameters
  n = len(bicyclists) 
  partial_derivative_lambda = -n  + 1/lambda * np.sum(x)
  return np.array([partial_derivative_lambda])

def gradient_descent(bicyclist, learning_rate = 0.01, tolerance = 1e-4, iterations = 1000):
  """
  """
  curr_lambda = np.array([lambda])
  for i in range(iterations):
    new_lambda = = curr_lambda.copy()
    difference = gradient(bicyclist, new_lambda) * -learning_rate 
    if new_lambda[0] < tolerance:
      break
     new_lambda += differnece
  return  new_lambda





## Maximum Likelihood II

A colleague of yours suggest that the parameter $\lambda$ must be itself dependent on the weather and other factors since people bike when its not raining. Assume that you model $\lambda$ as 

$$\lambda_i = \exp(\mathbf w^T \mathbf x_i)$$

where $\mathbf x_i$ is one of the example features and $\mathbf w$ is a set of parameters. 

Train the model with SGD with this assumption and compare the MSE of the predictions with the `Maximum Likelihood I` approach. 

You may want to use [this partial derivative of the log likelihood function](http://home.cc.umanitoba.ca/~godwinrt/7010/poissonregression.pdf)

In [None]:
import numpy as np
import math
from math import lambda 

#get data
df = pd.read_csv("nyc_bb_bicyclist_counts.csv")
bicyclist = df["bicyclist"]


