<a href="https://colab.research.google.com/github/PozzOver13/learning/blob/main/mathematics_4_machine_learning/20240206_markov_chains.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# References

https://elijahpotter.dev/articles/markov_chains_are_the_original_language_models?utm_source=substack&utm_medium=email  Markov Chains to predict next word  
https://www.datacamp.com/tutorial/markov-chains-python-tutorial  Markov Chains in python

# Markov Chains

Markov Chains are mathematical models that represent a sequence of events where the probability of transitioning from one state to another depends solely on the current state and is independent of the sequence of events that preceded it. In other words, the future state of the system is determined only by its present state, and not by how it arrived at that state.

In the context of credit risk for a data scientist, Markov Chains can be used to model the transitions between different credit states of a customer over time. Each state represents a certain creditworthiness level, such as good, fair, or poor credit. The transitions between these states are governed by transition probabilities, which are the probabilities of moving from one credit state to another in a given time period.

For example, consider a simplified Markov Chain representing credit states {Good, Fair, Poor}. The transition matrix might look like:

```
|   0.9   0.1   0     |   Good
|   0.2   0.7   0.1   |   Fair
|   0     0.3   0.7   |   Poor
```

Here, the numbers in the matrix represent transition probabilities. If a customer is currently in a "Good" credit state, there's a 90% chance they will remain in the "Good" state, a 10% chance they will move to the "Fair" state, and no chance of moving to the "Poor" state in the next time period.

By simulating the Markov Chain over multiple time periods, a data scientist in credit risk can model the dynamics of credit transitions and assess the likelihood of a customer moving between credit states. This information can be valuable for predicting creditworthiness changes and managing associated risks in a financial portfolio.

# Next location example

In [1]:
end_prob_start_from_grocery = [0.3, 0.7] # 70% of leaving for the planetarium every hour starting from the grocery store
end_prob_start_from_planetarium = [0.1, 0.9] # 70% of staying at the planetarium every hour starting from the planetarium

In [2]:
import numpy as np
import pandas as pd

In [18]:
matrix_location_prob = np.column_stack((end_prob_start_from_grocery, end_prob_start_from_planetarium))

In [19]:
matrix_location_prob

array([[0.3, 0.1],
       [0.7, 0.9]])

In [21]:
pd.DataFrame(matrix_location_prob,
             columns=['start_at_grocery', 'start_at_planetarium'],
             index=['end_at_grocery', 'end_at_planetarium'])

Unnamed: 0,start_at_grocery,start_at_planetarium
end_at_grocery,0.3,0.1
end_at_planetarium,0.7,0.9


In [22]:
presence_prob = [0.25, 0.75]

In [26]:
matrix_actual_presence_prob = np.array(presence_prob)

In [29]:
matrix_actual_presence_prob.reshape(2, 1)

array([[0.25],
       [0.75]])

In [30]:
pd.DataFrame(matrix_actual_presence_prob.reshape(2, 1),
             columns=['% Alice Present'],
             index=['grocery', 'planetarium'])

Unnamed: 0,% Alice Present
grocery,0.25
planetarium,0.75
