<img src="https://dauphine.psl.eu/fileadmin/_processed_/9/2/csm_damier_logo_Dauphine_f7b37a1ff2.jpg" width="200" style="vertical-align:middle" /> <h1>Master 222: Introduction to Python </h1>





[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/jandsy/introduction_python_dauphine/blob/main/Session_2/corrected_numpy_and_pandas_python_dauphine.ipynb)


# Remainders: Last Session
## Exercice Temperature Data Analysis
**Introduction**
You're provided with a list of temperatures (in degrees Celsius) spanning over a week:
``` python
temperatures = [20.5, 22.3, 19.8, 21.6, 23.2, 18.9, 20.2]
```
Your task is to analyze this data by developing specific functions and then interpreting the results.

**Functions to Develop**



1.   Function `average_temp()`:

*Input:* A list of temperatures.

*Task:* Calculate the average temperature for the week.

*Return:* The average temperature.

2. Function `hot_days_count()`:

*Input:* A list of temperatures.

*Task:* Determine the number of days the temperature was above 21°C.

*Return:* The count of days.

3. Function `coldest_day()`:

*Input:* A list of temperatures.

*Task:* Identify the index of the coldest day (0 for Monday, 6 for Sunday).

*Return:* The index of the day.

**Display the Results**

After you've developed and tested your functions, you should:

- Print the average temperature of the week.
- Print the number of days when the temperature was above 21°C.
- Print the coldest day of the week based on the index (e.g., "The coldest day was Wednesday...").

**Sample output**
``` python
The average temperature for the week is: 20.79°C.
There were 3 days with a temperature above 21°C.
The coldest day was Wednesday with a temperature of 18.9°C.
```


In [None]:
## Insert your code here

# Introduction to NumPy
## Context and Objective

Python is an almost indispensable programming language in the world of Quantitative finance. It's simple, open source, and increasingly popular.  
In this exercise, you will learn to use the NumPy module. NumPy is a Python package specialized in the manipulation of arrays.  
This exercise will only focus on one-dimensional arrays (vectors) and two-dimensional arrays (matrices).

[For more information on NumPy](http://www.numpy.org/)

## Prerequisite Skills

- Basic programming concepts
- Lists
- Basic linear algebra concepts

## Instructions

The exercise is composed of several questions; please answer them in order.

To begin, you need to import the `numpy` module using the alias `np`. Execute the following preamble cell:



In [None]:
import numpy as np

In Python, an array is an ordered collection of values, which can be of any type, not **only numbers**.

The `array()` method allows you to define a **one-dimensional array** from a list. Given `X` as a list of values, you can use the command `np.array(X)` to transform the list into a one-dimensional array.

* Create an array from the list `[1,1,1,1]`

In [None]:
## Insert your code here
my_array = np.array([1,1,1,1])

There are commands to inquire about the variables we are manipulating. Here's a table summarizing these commands:

| Command    | Effect                                         | Example                     |
|------------|------------------------------------------------|-----------------------------|
| type(X)    | Returns the type of the variable X            | type(2) returns `<class 'int'>`      |
| np.shape(X)| Returns the dimension of the variable X       | np.shape([1,2]) returns (2,) |

By default, Numpy creates one-dimensional arrays from lists. If you want a different dimension, you should specify it using the command `np.reshape(X, new_shape)` where `X` is the array whose dimensions you want to change.

* Create a variable *a* and assign to it an array with the list [1,2,3,4,5]
* Verify that its dimension is indeed (5,)

In [None]:
## Insert your code here
a = np.array([1,2,3,4,5])
print(np.shape(a))


(5,)


Now that we've seen how to get information about arrays, we'd like to create some. There are various commands to generate one-dimensional arrays. Here's a table summarizing them:

| Command               | Meaning                                                        | Example                                    |
|-----------------------|----------------------------------------------------------------|--------------------------------------------|
| np.ones(n)            | Returns an array of dimension (n,) of 1s                        | np.ones(5) returns array([1, 1, 1, 1, 1])  |
| np.zeros(n)           | Returns an array of dimension (n,) of 0s                        | np.zeros(5) returns array([0, 0, 0, 0, 0]) |
| np.arange(n)          | Returns an array of dim(n,) of ordered numbers from 0 to n-1    | np.arange(5) returns array([0, 1, 2, 3, 4])|
| np.linspace(a,b,n)    | Returns an array of dim(n,) of n numbers evenly spaced between a and b | np.linspace(0,5,5) returns array([0, 1.25, 2.5, 3.75, 5.0])|
| np.linspace(a,b)      | Returns an array of dim(50,) of 50 numbers evenly spaced between a and b |                                            |
| np.concatenate((X,Y)) | Returns an array of dim(dimX+dimY,) resulting from the assembly of X and Y | np.concatenate((array([1]),array([0]))) returns array([1,0])|

- Create 3 variables a, b, c
- Assign to a an array with 5 zeros
- Assign to b an array with 5 ones
- Assign to c an array of size 10 containing 5 zeros followed by 5 ones, arranged judiciously.


In [None]:
## Insert your code here
a, b = np.zeros(5), np.ones(5)
c = np.concatenate((a,b))
print('a is ', a)
print('b is ', b)
print('c is ', c)

a is  [0. 0. 0. 0. 0.]
b is  [1. 1. 1. 1. 1.]
c is  [0. 0. 0. 0. 0. 1. 1. 1. 1. 1.]


- Generate two arrays of ordered numbers from 0 to 10 (thus of size 11) using different commands.
- Display them.

In [None]:
## Insert your code here
first_array = np.arange(11)
second_array = np.linspace(0,10, 11)
print('using np.arrange :', first_array)
print('using np.linspace :', second_array)

using np.arrange : [ 0  1  2  3  4  5  6  7  8  9 10]
using np.linspace : [ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10.]


- Create a list `c` with numbers ranging from 0 to 10 **A list, not an array**. Use the following syntax: `list(range())`.
- Add 5 to all the terms in `c`. *You may need to change the type of c*.
- Display `c`.


In [None]:
## Insert your code here
c = list(range(11))
c = np.array(c) + 5
print(c)

[ 5  6  7  8  9 10 11 12 13 14 15]


We can perform similar operations with matrices, which are 2-dimensional arrays.

Thus, `np.ones((n, p))` returns an `NxP` matrix filled with ones, `np.zeros((n, p))` returns an `NxP` matrix filled with zeros.

`np.diag(v)` returns a matrix whose diagonal consists of the vector v. Moreover, `np.diag(v, k)` returns a matrix where the k-th diagonal consists of the vector v. k can be positive or negative; if k is positive, the shift is to the "right," otherwise to the left.

- Create a matrix *mat* of size 5x5 with 1s on the diagonal.
- Display it.

In [None]:
## Insert your code here
mat = np.diag(np.ones(5))
print(mat)

[[1. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0.]
 [0. 0. 1. 0. 0.]
 [0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 1.]]


We can use mathematical operators **+**, **-**, on arrays provided that the mathematical operation makes sense.

**Caution: If you use the operators '\*' or '/' you will only perform a term-by-term operation**
- Create a 6x6 matrix with 1s on the diagonal and on the sub-diagonal using a mathematical operator.
- Display it.



In [None]:
## Insert your code here
mat_a = np.diag(np.ones(6))
mat_b = np.diag(np.ones(5), - 1)
print(mat_a + mat_b)

[[1. 0. 0. 0. 0. 0.]
 [1. 1. 0. 0. 0. 0.]
 [0. 1. 1. 0. 0. 0.]
 [0. 0. 1. 1. 0. 0.]
 [0. 0. 0. 1. 1. 0.]
 [0. 0. 0. 0. 1. 1.]]


Accessing specific elements of an array is done similarly to lists. If the array is two-dimensional, two parameters are needed.

*For example*: let X be a two-dimensional array, `X[0, 0]` returns the element located at row 1, column 1. `X[:, 0]` returns the first column. `X[0:3, 0]` returns the first three rows of the first column. This method is referred to as *slicing* in programming.

- Create this matrix using `np.ones()`, `np.diag()` and slicing:
$$
\begin{pmatrix}
5 & 0 & 0 & 0 \\
5 & 1 & 0 & 0 \\
4 & 4 & 4 & 4 \\
5 & 0 & 0 & 1
\end{pmatrix}
$$

In [None]:
## Insert your code here
mat = np.diag(np.ones(4))
mat[:, 0] = 5
mat[2, :] = 4
print(mat)

[[5. 0. 0. 0.]
 [5. 1. 0. 0.]
 [4. 4. 4. 4.]
 [5. 0. 0. 1.]]


With the Numpy module, you can create random numbers uniformly distributed between 0 and 1. The syntax is as follows: `np.random.rand()` to return a single draw, `np.random.rand(n)` to return a row array of n draws, and `np.random.rand(n, p)` to return an NxP matrix of uniformly distributed random draws.

- Display a random number distributed between 0 and 1


In [None]:
## Insert your code here
np.random.rand()

0.4471196989302014

- Display a 5x5 matrix of random numbers distributed between 0 and 1

In [None]:
## Insert your code here
np.random.rand(5,5)

array([[0.7331281 , 0.5465886 , 0.36921581, 0.70896865, 0.56728667],
       [0.85963742, 0.40430734, 0.40035497, 0.04469544, 0.0700531 ],
       [0.54181488, 0.43955629, 0.41156646, 0.816314  , 0.41886596],
       [0.92554736, 0.96303696, 0.2489124 , 0.9898183 , 0.21163568],
       [0.76501586, 0.76357927, 0.8697321 , 0.07801873, 0.81927001]])

- Write a function `random_number()` that takes two integer parameters and returns a random number uniformly distributed between the two integers.
- Call the function `random_number(10, 15)`

*Note: If $X \sim U[0,1]$, then $Y := (b-a)X + a \sim U[a,b]$*



In [None]:
## Insert your code here
def random_number(a, b):
    return (b-a)*np.random.rand() + a

random_number(10, 15)

14.271702935970222

- Write a function `random_matrix()` that takes an integer parameter N and returns a NxN matrix with 1s everywhere except on the diagonal where there are numbers uniformly distributed between 0 and 1.
- Test for N=3 and N=5

> Example: `random_matrix(3)` should return a matrix similar to
$$
\begin{pmatrix}
0.62678954 & 1 & 1 \\
1 & 0.94077299 & 1 \\
1 & 1 & 0.29263003 \\
\end{pmatrix}
$$

In [None]:
## Insert your code here
def random_matrix(N):
    mat_1 = np.ones((N,N))
    mat_2 = np.diag(np.random.rand(N))
    return mat_1 - mat_2

random_matrix(3)

array([[0.30909679, 1.        , 1.        ],
       [1.        , 0.24363663, 1.        ],
       [1.        , 1.        , 0.03332141]])

- In NumPy, operations can be performed between arrays and scalars.
> Example:
```
a = np.array([1, 2, 3])
a * 4 returns array([4, 8, 12])
a + 2 returns array([3, 4, 5])
```

- Create a matrix mat_one of size 5x5 with fives on the diagonal
- Create two matrices mat_two and mat_two_bis of size 5x5 with twos everywhere, in two different ways
- Display the matrices

In [None]:
## Insert your code here
mat_one = np.ones((5,5))
mat_two = 2*mat_one
mat_two_bis = np.zeros((5,5))
mat_two_bis[:,:] = 2
print(mat_one)
print(mat_two)
print(mat_two_bis)

[[1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]]
[[2. 2. 2. 2. 2.]
 [2. 2. 2. 2. 2.]
 [2. 2. 2. 2. 2.]
 [2. 2. 2. 2. 2.]
 [2. 2. 2. 2. 2.]]
[[2. 2. 2. 2. 2.]
 [2. 2. 2. 2. 2.]
 [2. 2. 2. 2. 2.]
 [2. 2. 2. 2. 2.]
 [2. 2. 2. 2. 2.]]


In NumPy, operations between arrays are performed element-wise by default.

> Example:
```
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
a * b returns array([4, 10, 18])
```

To perform matrix multiplication in the mathematical sense, the following syntax is used: np.dot(X,Y)

If the dimensions are incompatible, errors are triggered.



- Create a matrix `mat_one` of size 5x5 with random numbers.
- Create a matrix `mat_two` of size 5x5 with ones everywhere.
- Create a matrix `mat_three` and assign to it the element-wise product between `mat_one` and `mat_two`
- Create a matrix mat_four and assign to it the matrix product between `mat_one` and `mat_two`
- Display `mat_three` and `mat_four`

In [None]:
## Insert your code here
mat_one = np.random.rand(5,5)
mat_two = np.ones((5,5))
mat_three = mat_one*mat_two
mat_four = np.dot(mat_one, mat_two)

print('mat_three:', mat_three)
print('mat_four:', mat_four)

mat_three: [[0.86471915 0.94851745 0.19368994 0.70559777 0.58474969]
 [0.71729448 0.6024622  0.23729999 0.71576877 0.57123371]
 [0.35111984 0.00103395 0.70986055 0.05997593 0.67626518]
 [0.77126035 0.37421595 0.81789305 0.33697101 0.23954339]
 [0.83879631 0.59120295 0.34627217 0.27050175 0.47865704]]
mat_four: [[3.29727401 3.29727401 3.29727401 3.29727401 3.29727401]
 [2.84405916 2.84405916 2.84405916 2.84405916 2.84405916]
 [1.79825546 1.79825546 1.79825546 1.79825546 1.79825546]
 [2.53988375 2.53988375 2.53988375 2.53988375 2.53988375]
 [2.52543022 2.52543022 2.52543022 2.52543022 2.52543022]]



- Create a matrix `a` with dimensions 5x2 with arbitrary values
- Create another matrix `b` with dimensions 2x5 with arbitrary values
- Return the meaningful product of the two matrices here



In [None]:
## Insert your code here
a = np.random.rand(5,2)
b = np.random.rand(2, 5)
print(np.dot(a,b))

[[0.51620165 0.20602091 0.19899608 0.44107435 0.31266037]
 [0.40484983 0.19542903 0.21518839 0.295991   0.29675906]
 [1.36863879 0.7606343  0.90205718 0.8531536  1.1554461 ]
 [0.58174415 0.34404615 0.41963797 0.33204403 0.52270151]
 [1.21070998 0.64382419 0.74724998 0.79754863 0.97789836]]


The use of logical operators is possible via NumPy.

- Create two matrices *mat_one* and *mat_two* of size 5x5 with random values.
- Using the operator `*` and the logical operator '`==`', return a 5x5 matrix of True.
- Using matrix multiplication and the logical operator '`==`', return a 5x5 matrix of False.


In [None]:
## Insert your code here
mat_one, mat_two = np.random.rand(5,5), np.random.rand(5,5)
print(0*mat_one == 0*mat_two)
print(mat_one == mat_two)

[[ True  True  True  True  True]
 [ True  True  True  True  True]
 [ True  True  True  True  True]
 [ True  True  True  True  True]
 [ True  True  True  True  True]]
[[False False False False False]
 [False False False False False]
 [False False False False False]
 [False False False False False]
 [False False False False False]]


Lastly, it is possible to analyze data with NumPy. Here are some functions summarized:

| Command   | Meaning                 |
|-----------|-------------------------|
| np.mean(X) | returns the mean of X   |
| np.var(X)  | returns the variance of X|
| np.std(X)  | returns the standard deviation of X |
| X.sum()  | sums the elements of X   |
| X.prod() | multiplies the elements of X |
| X.min()  | returns the minimum of X |
| X.max()  | returns the maximum of X |

Furthermore, when working with matrices, it's possible to specify a second argument or a parameter to clarify where we are working. For example:

```
mat = np.random.rand(5, 5)
np.mean(mat, axis = 0)  ## returns the mean of the rows
np.mean(mat, axis = 1) ## returns the mean of the columns
mat.sum(axis = 0) ## returns the sum of the rows
```
- Verify that the mean of a uniformly distributed law on [0,1] is close to 0.5 for a large number of draws.

**Note:**

*As the number of draws increases, the mean value of the uniformly distributed random values should converge to 0.5 according to the Law of Large Numbers.*

In [None]:
## Insert your code here
n = 100000
np.mean(np.random.rand(n))

0.5010119449037621

- Calculate this product using only Numpy methods, display the result

$$
\frac{\pi}2=\prod_{n=1}^{\infty}\frac{4n^2}{4n^2-1}
$$

- Compare using `np.math.pi`

---

In [None]:
## Insert your code here
n = 10000
print(np.prod((4*np.linspace(1, n, n)*np.linspace(1, n, n))/(4*np.linspace(1, n, n)*np.linspace(1, n, n) - 1)))
print(np.math.pi/2)

1.5707570593409783
1.5707963267948966


## To go further..

**Exercise 1:**
- Create an array of 10 zeros.
- Create an array of 10 ones.
- Create an array of 10 fives.
- Create an array of the integers from 10 to 50.

**Exercise 2:**
- Create an array of all even integers from 10 to 50.
- Create a 3x3 matrix with values ranging from 0 to 8.
- Create a 3x3 identity matrix.
- Use indexing to replace the top row of the matrix from Exercise 2 with 9s.

**Exercise 3**
- Generate a random array of size 25. Find its mean.
- Generate a random matrix of size 5x5. Find the sum of all the elements, the sum of the columns, and the sum of the rows.

**Exercise 4**
- Multiply a 5x3 matrix by a 3x2 matrix using matrix multiplication.
- Multiply a 5x5 matrix by a 5x1 vector.

**Exercise 5**
- Create an array of 10 random numbers. Replace all the values less than 0.5 with 0.

# Introduction to Pandas

In this exercise, you will learn to use the pandas module. Pandas is a Python package specialized in data manipulation.

[For more information on pandas click here](http://pandas.pydata.org/)

#### Required Skills

- Basic programming knowledge
- Lists
- Linear algebra concepts
- Introduction to NumPy

#### Instructions

The exercise consists of several questions, complete them in order.

To begin, you need to import the `pandas` module under the abbreviated name `pd`. Therefore, execute this preamble cell. NumPy will also be used in this exercise.



In [None]:
# Importing necessary libraries
import pandas as pd
import numpy as np

> A Series is a one-dimensional array with labels, that can hold any data type. The labels are referred to as the index. Its syntax is as follows: pd.Series(X) where X is a list or an array.
- Create a Series of 5 data points from a random list of numbers distributed between 0 and 1.

In [None]:
## Insert your code here
pd.Series(np.random.rand(5))

0    0.275694
1    0.082329
2    0.507153
3    0.311576
4    0.649334
dtype: float64

> It is possible to specify the indices using the following syntax: pd.Series(X, index = Y) where X is the data list and Y is a list of associated indices.
- Create a Series of 4 data points, each with a value of 1, and specify the following list of indices: ['a', 'b', 'c', 'd'].

In [None]:
## Insert your code here
pd.Series(np.ones(4), index = ['a', 'b', 'c', 'd'])


a    1.0
b    1.0
c    1.0
d    1.0
dtype: float64

> By using indices, you can access the data in the Series in the same way you access elements in a list.

Slicing is also possible.
- Create a variable named series_one and assign it a Series created from a list of 4 random numbers distributed between 0 and 1.
- Use the list of indices from the previous question: ['a', 'b', 'c', 'd'].
- Retrieve the first element of series_one using the corresponding index.

In [None]:
## Insert your code here
series_one = pd.Series(np.random.rand(4), index = ['a', 'b', 'c', 'd'])
print(series_one['a'])

0.5701468749380846


- Change the fourth data point of `series_un` to 0.
- Display `series_un`.

In [None]:
## Insert your code here
series_one['a'] = 0
print(series_one)

a    0.000000
b    0.140269
c    0.493091
d    0.321717
dtype: float64


Python consistently returns a `dtype: float64` when calling the Series. This represents the data type, in this case floats, and their encoding, here on 64 bits. You can specify the data type you want to handle when creating a Series.

Furthermore, you can name the Series using the `name` parameter.

- Create a variable `series_two` from an array of four ones.
- Specify the data type `dtype` as `int`.
- Name this Series `my_series`.
- Display the Series.

In [None]:
## Insert your code here
series_two = pd.Series(np.ones(4), dtype = int, name = 'my_series')
print(series_two)

0    1
1    1
2    1
3    1
Name: my_series, dtype: int64


The `describe()` function returns a variety of information about the Series it is applied to.

- Create a variable `series_three` from an array of 20 random numbers uniformly distributed between 0 and 1.
- Display information about the Series using `describe()`.


In [None]:
## Insert your code here
series_three = pd.Series(np.random.rand(20))
series_three.describe()

count    20.000000
mean      0.328983
std       0.254890
min       0.072618
25%       0.166325
50%       0.240520
75%       0.393073
max       0.918765
dtype: float64

It's possible to add Series together. Pandas will sum the data with matching *indices*. If an *index* is missing in one of the Series, the resulting sum Series will display `NaN` (Not a Number) at that index.

- Create a Series `series_four` from an array of 19 random numbers uniformly distributed between 0 and 1.
- Sum `series_three` and `series_four`.

In [None]:
## Insert your code here
series_four = pd.Series(np.random.rand(19))
print(series_three + series_four)

0     0.989894
1     0.879207
2     0.246522
3     0.985539
4     1.149596
5     0.764882
6     0.936611
7     0.615035
8     0.949931
9     0.358049
10    0.606920
11    1.273804
12    0.839642
13    0.450291
14    0.754320
15    0.630782
16    0.556529
17    0.583265
18    0.830415
19         NaN
dtype: float64


However, you can specify a particular value to use where the *indices* do not match during a summation. The following syntax is used:
```
## Assume a and b are two Series
a.add(b, fill_value = 0)  ## we decide to replace with 0
```

- Sum `series_three` and `series_four` by specifying fill_value equal to 100.

In [None]:
## Insert your code here
series_three.add(series_four, fill_value = 100)

0       0.989894
1       0.879207
2       0.246522
3       0.985539
4       1.149596
5       0.764882
6       0.936611
7       0.615035
8       0.949931
9       0.358049
10      0.606920
11      1.273804
12      0.839642
13      0.450291
14      0.754320
15      0.630782
16      0.556529
17      0.583265
18      0.830415
19    100.260295
dtype: float64

Lastly, it's possible to use mathematical operators on Series. The following syntax is used:
```
# Assume a is a Series
a[a >= 0.5]  ## returns the data from a greater than 0.5
a * 2  ## multiplies the data from a by two
```

- Create a variable `a`, and assign it a Series of integer numbers uniformly distributed between 1 and 20, with a size of 20.
- Display the Series with data strictly greater than 10.

In [None]:
## Insert your code here
a = pd.Series(np.random.uniform(1,20, 20))
print(a[a > 10])

0     19.764755
1     14.680251
2     14.027396
3     16.106461
5     17.298290
6     13.474510
7     15.428416
9     12.414498
13    10.083957
16    10.018372
17    16.821738
19    19.636069
dtype: float64


To conclude on the manipulation of Series, we propose the following instructions:

- Create an *index* of size 20 that includes "boy" or "girl" randomly distributed.
- Create an array of size 20 that displays ages ranging from 3 to 16 years, randomly distributed.
- Create a Series `cousins` with `name = "my cousins"`, the index created previously, and data from the array.
- Retrieve the Series of "boys" into a Series `boys` and the Series of "girls" into a variable `girls`.
- Display information about these two Series.


In [None]:
## Insert your code here
index_series = [["boy", "girl"][x] for x in np.random.randint(0, 2, 20)]
age_series = np.random.randint(3, 16, 20)
series = pd.Series(age_series, index = index_series, name = 'my cousins')
boys = series["boy"]
girls = series['girl']
boys.describe()


count     7.000000
mean      9.000000
std       2.236068
min       5.000000
25%       8.000000
50%      10.000000
75%      10.500000
max      11.000000
Name: my cousins, dtype: float64

In [None]:
girls.describe()

count    13.000000
mean      8.461538
std       3.596651
min       3.000000
25%       6.000000
50%       9.000000
75%      11.000000
max      15.000000
Name: my cousins, dtype: float64

Now we turn our attention to DataFrames. DataFrames are the two-dimensional extension of Series. Thus, the *indices* are shared among the columns of the DataFrame.

A common way to create a DataFrame is by using a dictionary. The syntax is as follows:
```
pd.DataFrame({'Name of the first column': data, 'Name of the second column': data})
```
- Create a DataFrame data_one with two columns: 'Gender' and 'Age', using the data from the previous question.
- Display the DataFrame.

In [None]:
## Insert your code here

- Create a list *dominant_hand* of size 20 that contains "left-handed" or "right-handed" distributed randomly.
- Add this list as a new column to *data_one*.
  *Use the command `data_one['Dominant Hand'] = dominant_hand`*


In [None]:
## Insert your code here

*Slicing* is possible with DataFrames.

- Display the first 5 rows of *data_one*.

In [None]:
## Insert your code here

- Display the columns "Gender" and "Dominant Hand".


It is possible to concatenate two DataFrames using the command `pd.concat()`. The syntax is as follows:
```
# Assume X and Y are two DataFrames
pd.concat([X,Y], axis = 0)  ## concatenates vertically
pd.concat([X,Y], axis = 1)  ## concatenates horizontally
```
- Create a list of size 20 that includes "red", "blue", or "green" distributed randomly.
- Create a DataFrame data_two from the previous list.
- Add a name to the column using the command data_two.columns = ['Name of Column'].
- Concatenate data_one and data_two into a new variable data_three.


In [None]:
## Insert your code here

## **Open Exercise**
We advice you to use pandas documentations or stackoverflow to find the answers of the following exercises .
#### 1. Basic DataFrame Operations:

- Create a DataFrame from a dictionary with keys: 'Name', 'Age', 'City' and populate it with some data.
- Display the first 5 rows of the DataFrame.
- Display the last 3 rows of the DataFrame.
- Display the data types of each column.

#### 2.Indexing and Selection:

- Select the 'Name' and 'City' columns from the DataFrame.
- Select the row at index 2 from the DataFrame.
- Select the rows where 'Age' is greater than 25.

#### 3. Sorting and Ranking:

- Sort the DataFrame based on 'Age' in descending order.
- Rank the DataFrame based on 'Age', with the oldest as rank 1.

#### 4. Missing Data:

- Introduce some missing values in the DataFrame using np.nan.
- Fill the missing values with the mean of the non-missing values.
- Drop the rows with missing values.

#### 5.Grouping and Aggregation:

- Group the DataFrame by 'City' and calculate the mean age for each city.
- Find the maximum and minimum age for each city.

#### 6.Merging, Joining, and Concatenating:

- Create a second DataFrame with keys: 'Name', 'Job Title'.
- Merge the two DataFrames on the 'Name' column.
- Concatenate the two DataFrames vertically and then horizontally.

In [None]:
## Insert your code here