<img src="https://dauphine.psl.eu/fileadmin/_processed_/9/2/csm_damier_logo_Dauphine_f7b37a1ff2.jpg" width="200" style="vertical-align:middle" /> <h1>Master 222: Introduction to Python </h1>





[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/jandsy/introduction_python_dauphine/blob/main/Session_2/numpy_and_pandas_python_dauphine.ipynb)


# Introduction to Matplotlib
> Matplotlib is a Python library that serves as a powerful tool for plotting and visualizing data. It is designed to produce a wide variety of plots and graphs. Matplotlib includes a sub-library called pyplot, which creates an interface similar to that of the commercial software Matlab, and contains functions very similar to it.

> There are libraries like Seaborn that can automatically beautify the figures or give them a different style, but we will not be integrating them in this training.

> Furthermore, Matplotlib is a very rich library, and not all its functionalities can be covered. This training chooses to explore certain functions more than others, with the overall goal of enabling any student to become proficient with the module by the end of the course.

> To begin, it's necessary to import the matplotlib.pyplot module under the shortened name of 'plt'. <br>
Once a graph is constructed, the plt.show() command will allow you to visualize it.<br>
However, in a Notebook, like here on collab, adding %matplotlib inline at the beginning of the page will automatically display the figures with each modification or use of a pyplot command, after the execution of the cell.<br>

- Import matplotlib.pyplot and add %matplotlib inline
- Import numpy

In [None]:
## Insert your code here

**Curve Definition and Plotting**
> A curve is, by definition, a set of points with coordinates (x, y) that may or may not be connected by a line. The more points there are, the smoother the curve will appear.

> The plot() method allows for plotting curves that connect points whose x (abscissa) and y (ordinate) values are provided in lists or arrays.<br>
To plot a graph with 'x' values on the horizontal axis and 'y' values on the vertical axis, we write: plt.plot(x,y).

- Plot a curve with the x-values [0,2,4,6] and the y-values [1,4,4,8].

In [None]:
## Insert your code here

**Automatic Abscissa Generation**

> If only a single list or array is inserted into the plot() command, Matplotlib assumes that it's a sequence of y (ordinate) values and automatically generates the x (abscissa) values for you. The x values will be the indices of the y values, starting from 0.

- Plot a curve using the list [1,3,2,4].


In [None]:
## Insert your code here

**Adding Titles and Axis Labels**

> To add a title to the graphs, we use the title method.
> To add labels to the axes, we use the xlabel and ylabel methods.

- Plot a curve passing through the following points: (50,1), (100,3), (200,4).
- Title the figure 'My First Curve'.
- Label the x-axis as 'abscissa' and the y-axis as 'ordinates'.

In [None]:
## Insert your code here

> The plot() function simply connects points in the order they are provided. It's possible to provide multiple points with the same x-coordinate to draw a specific shape.

- Create the following x and y lists: x = [0, 1, 1, 0, 0] , y = [0, 0, 1, 1, 0].<br>
Use the plot function to connect these points and set the limits of both axes from -1 to 2.

In [None]:
## Insert your code here

Similarly, one can plot a parametric curve using a sequence **t**. For this, provide the `plot` method with one function of **t** for the x-coordinates and another for the y-coordinates.

- Use the **linspace** method from *Numpy* to create a sequence of 100 numbers between 0 and \(2\pi\).
- Plot the parametric curve defined by $$ f(t) = (\sin(2t), \sin(3t)), t \in [0, 2\pi].$$


In [None]:
## Insert your code here

**Exercise: Simulating Asset Paths using Geometric Brownian Motion**

**Goal:** Create a simulation of potential future stock prices using the Geometric Brownian Motion (GBM) model and visualize the results.

**Import Required Libraries**
- Import numpy and matplotlib.pyplot

**Define GBM Parameters**
- $S_0$: Initial stock price
- $T$: Time horizon (in years)
- $r$: Risk-free rate (annualized)
- $sigma$: Volatility (annualized)
- $dt$: Time increment, e.g., a day
- $n$: Number of time steps
- $m$: Number of potential paths to simulate

**Simulate GBM Paths**

We recall that GBM is defined as
$$ dS_t = \mu S_t dt + \sigma S_t dW_t $$

Where:
- $S_t$: Stock price at time $t$.
- $\mu$: Expected return or the drift, which is typically the risk-free rate.
- $\sigma$: Volatility of the stock.
- $dW_t$: Wiener process or Brownian motion.

For discrete time intervals, the formula to simulate the stock price $S_{t+1}$  given a stock price $S_t$ is:

$$ S_{t+1} = S_t  \exp \left( (\mu - \frac{\sigma^2}{2}) \Delta t + \sigma \sqrt{\Delta t} Z \right) $$

Where:
- $ \Delta t$: Size of the time step.
- $Z$: A random draw from a standard normal distribution.

**Visualize the Simulated Paths**




In [None]:
## Insert your code here

## Matplotlib Exercises

### 1. Basic Line Plot
- Generate a sequence of numbers from -10 to 10.
- Plot their square values on a graph. Label the x-axis as "Numbers" and the y-axis as "Squares" and give the plot a title.

### 2. Multiple Line Plots
- Using the same sequence from the previous exercise, plot both the square and cube values on the same graph. Use different colors and line styles for each plot and add a legend to distinguish between them.

### 3. Scatter Plot
- Generate 50 random numbers for x-values and y-values.
- Plot them using a scatter plot. Experiment with changing the size, color, and marker style.

### 4. Histogram
- Generate 1000 random numbers following a normal distribution using `numpy`.
- Plot a histogram to visualize the distribution. Play around with the number of bins.

### 5. Bar Chart
- Take a list of fruits: `['Apple', 'Banana', 'Cherry', 'Date', 'Elderberry']`.
- Assume some random sale values for each fruit and plot a bar chart to visualize fruit sales.

### 6. Pie Chart
- Using the fruit sales data from the previous exercise, plot a pie chart to show the proportion of sales by fruit. Ensure each section of the pie chart is labeled and displays a percentage.

### 7. Subplots
- Create four subplots in a 2x2 grid.
  - In the top-left, plot a sine curve.
  - In the top-right, plot a cosine curve.
  - In the bottom-left, plot a tangent curve.
  - In the bottom-right, plot a circle.
- Ensure each subplot has a title.

### 8. 3D Plotting
- Create a 3D surface plot. You can use the `numpy` meshgrid function to generate x, y, and z values. For instance, plot the surface \( z = x^2 + y^2 \).

### 9. Animation
- Create an animation of a sine wave whose frequency increases over time.

### 10. Customization
- Choose any of the previous exercises and experiment with the aesthetics. Change colors, fonts, line styles, marker styles, etc. Also, explore how to add text annotations, grid lines, and other custom elements.



# Introduction to yfinance

### What is yfinance?

`yfinance` stands for Yahoo Finance, and the `yfinance` library in Python allows you to access financial data from Yahoo! Finance. This can include stock prices, historical data, financial statements, and much more.

### Why use `yfinance`?

If you're interested in financial analysis, algorithmic trading, or just want to understand how the stock market behaves over time, `yfinance` is a great tool to start with. With just a few lines of Python code, you can retrieve vast amounts of financial data.

### Getting Started

Start by importing the library:
```
import yfinance as yf
```
Fetching Data:

You can fetch data for a specific stock or index. Here's how to get the historical data for Apple Inc. (AAPL) for the past month:
```
data = yf.download("AAPL", period="1mo")
print(data)
```

- Fetch the data for AAPL
- Using matplotlib.pyplot, plot the closing prices



In [None]:
## Insert your code here

**Exercise: Analyzing Historical Stock Prices**
> Context:
You are a financial analyst at an investment firm. You have been given a dataset containing historical stock prices of several companies over the past decade. Your task is to clean, analyze, and visualize the data to help the firm make informed investment decisions.

> Dataset:
The dataset contains the following columns: 'Date', 'Ticker', 'Open', 'High', 'Low', 'Close', 'Volume'.


**Data Loading and Inspection**:

- Fetch the data for 'AAPL', 'MSFT' and 'GOOGL' in a same dataframe
- Display the first 10 rows of the dataset.
- Check for any missing values in the dataset.

**Data Cleaning**:

- Handle any missing values by either filling them with appropriate values or dropping them.
- Ensure the 'Date' column is in datetime format.

**Data Transformation**:

- Set the 'Date' column as the index of the DataFrame.
Create a new column 'Price_Average' which is the average of 'High' and 'Low' prices for the day.

**Data Analysis**:

- Find the date and company with the highest trading volume.
- Calculate the monthly average closing price for each company.
- Determine the company with the highest price variance (i.e., difference between High and Low prices) over the period.

**Data Aggregation**:

- Group the data by 'Ticker' and calculate the total trading volume and average closing price for each company over the period.

**Statistical Analysis**:

- Calculate the correlation between the daily closing prices of different companies.
- Determine the company with the highest and lowest volatility based on the standard deviation of closing prices.

**File I/O**:

- Save the cleaned and transformed dataset to a new CSV file.
- Save the plots as image files.

**Bonus**:

- Perform a simple linear regression analysis to predict future prices based on historical prices.
- Create a simple moving average trading strategy and backtest it on one of the companies.

In [None]:
## Insert your code here

**Exercice: Momentum Strategy on 50 Stocks**

## Objective
Explore the performance of a simple momentum strategy on a subset of 50 stocks.

## Procedure

### 1. Stock Selection (Hard Question)
- Use `pandas` to fetch the list of all tickers from https://en.wikipedia.org/wiki/List_of_S%26P_500_companies
- Randomly select 50 stocks from this list.

### 2. Data Collection
- For each stock, download the daily adjusted closing prices for the last 365 days.
- Store this data in a pandas DataFrame with dates as the index.

### 3. Momentum Calculation
- Define momentum as the percentage change in the stock price over the previous n days (e.g., n=20 for a month).
- Using pandas, compute the momentum for each stock for each day.
- Rank the stocks based on their momentum every day.

### 4. Strategy
- At the beginning of each month, buy the top 10 stocks based on momentum from the previous month.
- At the end of the month, sell all the stocks.
- Repeat this process for each month in your dataset.

### 5. Performance Analysis
- Calculate the monthly and annualized return of this strategy.
- Compare the strategy's performance against a benchmark (e.g., S&P 500).
- Plot the cumulative returns of your strategy and the benchmark.

### 6. Risk Analysis
- Using numpy, compute the standard deviation of the monthly returns to get a measure of the strategy's risk.
- Calculate the Sharpe Ratio.

## Hints
- To get the data of a stock using yfinance:
  ```python
  import yfinance as yf
  data = yf.download("AAPL", start="2022-01-01", end="2023-01-01")
  ```
- To compute momentum:
  ```python
n = 20
momentum = data['Adj Close'].pct_change(n)
    ```
- To rank stocks based on momentum:
  ```python
ranked_stocks = momentum.sort_values(ascending=False)
```


In [None]:
## Insert your code here