In [None]:
import pandas as pd
import numpy as np

Series

A Series can be created from a list, dictionary, numpy array, or scalar value. It can be manipulated using basic operations such as indexing, slicing, and arithmetic operations.

In [None]:
s = pd.Series([1, 3, 5, np.nan, 6, 8], index=['a', 'b', 'c', 'd', 'e', 'f'])
print(s)

a    1.0
b    3.0
c    5.0
d    NaN
e    6.0
f    8.0
dtype: float64


Access to elements of a Serie

In [None]:
s['a']

1.0

In [None]:
s[['a','e', 'b']]

a    1.0
e    6.0
b    3.0
dtype: float64

In [None]:
s.loc['a']

1.0

In [None]:

s.loc[['a','b']]

a    1.0
b    3.0
dtype: float64

In [None]:
s.iloc[[0,1]]

a    1.0
b    3.0
dtype: float64

**DataFrame**

A DataFrame is a 2-dimensional labeled data structure in Pandas. It is similar to a spreadsheet or a SQL table, and can hold different data types in each column. A DataFrame is defined by two components: the values and the columns.

In [None]:
data = {'col1': [1, 2, 3, 4], 'col2': [4, 5, 6, 7], 'col3': [7, 8, 9, 10]}
df = pd.DataFrame(data)
print(df)

   col1  col2  col3
0     1     4     7
1     2     5     8
2     3     6     9
3     4     7    10


In [None]:
df.loc[[0]]

Unnamed: 0,col1,col2,col3
0,1,4,7


In [None]:
df.loc[[0], ['col1', 'col2']]

Unnamed: 0,col1,col2
0,1,4


In [None]:
df.loc[:,['col1', 'col3']]

Unnamed: 0,col1,col3
0,1,7
1,2,8
2,3,9
3,4,10


In [None]:
df.loc[ :  , :  ]

Unnamed: 0,col1,col2,col3
0,1,4,7
1,2,5,8
2,3,6,9
3,4,7,10


In [None]:
df.iloc[ : , [1,2]]

Unnamed: 0,col2,col3
0,4,7
1,5,8
2,6,9
3,7,10


From a dictionary: You can create a DataFrame from a dictionary of equal-length lists or numpy arrays. The keys of the dictionary will become the column names of the DataFrame.

Unnamed: 0,col1,col3
0,1,7
1,2,8
2,3,9
3,4,10


In [None]:
data = {'name': ['John', 'Jane', 'Jim', 'Joan'],
        'age': [32, 28, 41, 35],
        'city': ['New York', 'London', 'Berlin', 'Paris']}
df = pd.DataFrame(data)
print(df)

   name  age      city
0  John   32  New York
1  Jane   28    London
2   Jim   41    Berlin
3  Joan   35     Paris


From a numpy array: You can create a DataFrame from a 2D numpy array. The columns will be assigned default names.

In [None]:
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
df = pd.DataFrame(arr, columns=['A', 'B', 'C'])
print(df)

   A  B  C
0  1  2  3
1  4  5  6
2  7  8  9


From a CSV file: You can create a DataFrame from a CSV file using the pd.read_csv() function.

In [None]:
df = pd.read_csv('data.csv')

From a list of dictionaries: You can create a DataFrame from a list of dictionaries where each dictionary represents a row of the DataFrame.

In [None]:
data = [{'name': 'John', 'age': 32, 'city': 'New York'},
        {'name': 'Jane', 'age': 28, 'city': 'London'},
        {'name': 'Jim', 'age': 41, 'city': 'Berlin'},
        {'name': 'Joan', 'age': 35, 'city': 'Paris'}]
df = pd.DataFrame(data)

In [None]:
df

Unnamed: 0,name,age,city
0,John,32,New York
1,Jane,28,London
2,Jim,41,Berlin
3,Joan,35,Paris


From a Series: You can create a DataFrame from a Series by using the to_frame() method.

In [None]:
s1 = pd.Series([1, 2, 3, 4], name='A')
print(s1)
df = s1.to_frame()
print(df)

0    1
1    2
2    3
3    4
Name: A, dtype: int64
   A
0  1
1  2
2  3
3  4


In [None]:
s2 = pd.Series([1, 2, 3, 4], name='B')
s3 = pd.Series(['a', 'b', 'c', 'd'], name='C')
df = pd.DataFrame([s1, s2, s3])
print(df)

   0  1  2  3
A  1  2  3  4
B  1  2  3  4
C  a  b  c  d


Accessing Elements in a DataFrame:

a. Create a DataFrame with the following data:

data = {'Name': ['John', 'Jane', 'Jim', 'Jessica'],
         'Age': [30, 28, 25, 32],
         'Country': ['USA', 'Canada', 'UK', 'Australia']}

b. Use the loc method to access the record for 'Jane'

c. Use the iloc method to access the record for 'Jim'

d. Use the loc method to access the 'Age' column for 'Jessica'

e. Use the iloc method to access the 'Country' column for the first two records

In [None]:
import pandas as pd

data = {'name':["john","jane","jim","jessica"],"age":[30,28,25,32],"country":["usa","canada","uk","australia"] }
df = pd.DataFrame(data)
print(df)

      name  age    country
0     john   30        usa
1     jane   28     canada
2      jim   25         uk
3  jessica   32  australia


In [None]:
df.loc[[1] , : ]

Unnamed: 0,name,age,country
1,jane,28,canada


In [None]:
df.iloc[[2] , :]

Unnamed: 0,name,age,country
2,jim,25,uk


In [None]:
df.loc[ [3] , ["age"]  ]

Unnamed: 0,age
3,32


In [None]:
df.iloc[ [3] , [1]  ]

Unnamed: 0,age
3,32


**Accessing Elements in a Series:**

a. Create a Series with the following data:

data = [30, 28, 25, 32]

b. Use the index operator (square brackets) to access the second element in the Series

c. Use the loc method to access the third element in the Series

d. Use the iloc method to access the first element in the Series

e. Use the iloc method to access the last two elements in the Series

Note: You can use the following code to create the DataFrame and Series in the exercises:

Creating a DataFrame :
df = pd.DataFrame(data, columns=['Name', 'Age', 'Country'])

Creating a Series
s = pd.Series(data)

**Exercise 2:**

Import pandas and create a DataFrame df with the following data:

data = {'Name': ['John', 'Jane', 'Jim', 'Jack', 'Jill'],
        'Age': [34, 28, 41, 38, 31],
        'Country': ['USA', 'UK', 'Canada', 'Australia', 'India']}

Access the first three rows of the df DataFrame using iloc.

Access the second and third rows of the Age column using loc.

Access the second row of the Name and Age columns using iloc.

Try accessing the second and third rows of the Country column using both loc and iloc.

Try accessing a non-existing row or column from the DataFrame and observe the error message.

Try accessing a non-existing element in a Series and observe the error message.

**Access with conditions**

Pandas offers several functions for accessing data in a DataFrame or Series based on conditions. These functions include loc and iloc.

**1) Dataframe.loc[conditions ,  columns]**

The loc function is used to access data in a DataFrame based on the labels of the rows and columns. You can use conditions to select specific rows or columns. For example:

df.loc[df["column_name"] > value, ["column_name_1", "column_name_2"]]

This will select all rows where the values in the column_name column are greater than value, and only the column_name_1 and column_name_2 columns will be returned.

In [None]:
import pandas as pd

# create a sample DataFrame
df = pd.DataFrame({'A': [3, 2, 0, 1, 5],
                   'B': [10, 20, 30, 40, 50],
                   'C': ['A', 'B', 'C', 'D', 'E']})

print(df)

   A   B  C
0  3  10  A
1  2  20  B
2  0  30  C
3  1  40  D
4  5  50  E


In [None]:
# use the .loc function with a condition
result = df.loc[(df['A'] > 3) & (df['A'] < 200)]

print(result)

   A   B  C
4  5  50  E


In this example, the .loc function is used to select rows from the DataFrame based on a condition (df['A'] > 2). The resulting DataFrame contains only the rows where the value in column A is greater than 2. The .loc function is a powerful tool for filtering data based on conditions in Pandas.

In [None]:
# use the .loc function with a condition
result = df.loc[df['C'] == 'E']

print(result)

   A   B  C
4  5  50  E


In [None]:
# use the .loc function with a condition and select specific columns
result = df.loc[df['A'] > 2, ['A', 'C']]

print(result)

print("type du resultat est ", type(result))

   A  C
0  3  A
4  5  E
type du resultat est  <class 'pandas.core.frame.DataFrame'>


NOTE : the index method in a DataFrame or a Serie, will return the indexes of the rows

In [None]:
print(result.index)

Int64Index([0, 4], dtype='int64')


In this exemple, the index return and itterable (similar to a list or an np.array), that contains the indices of the results.

This important to know in order ro use iloc with conditions

***2) DataFrame.iloc[conditions, columns]***

Reminder : The iloc function is used to access data based on the integer index of the rows and columns

In [None]:
import pandas as pd

# Create a sample dataframe
data = {'Name': ['John', 'Jane', 'Jim', 'Joan'],
        'Age': [26, 27, 28, 29],
        'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']}
df = pd.DataFrame(data)

print(df)

   Name  Age         City
0  John   26     New York
1  Jane   27  Los Angeles
2   Jim   28      Chicago
3  Joan   29      Houston


In [None]:
# Use the iloc method to select rows where the value in the 'Age' column is greater than 27
selected_rows = df.iloc[df.index[(df['Age'] > 27) | (df['Name'] == 'John')], :]
print(selected_rows)

   Name  Age      City
0  John   26  New York
2   Jim   28   Chicago
3  Joan   29   Houston


In [None]:
list_res = df.index[(df['Age'] > 27) | (df['Name'] == 'John')]
df.iloc[list_res, :]

Unnamed: 0,Name,Age,City
0,John,26,New York
2,Jim,28,Chicago
3,Joan,29,Houston


In this example, the Python interpretter will first run (df['Age'] > 27) this will create a boolean Serie :

In [None]:
df['Age'] > 27

0    False
1    False
2     True
3     True
Name: Age, dtype: bool

As you can notice the boolean Serie, will have the value True when the conditions is met, or False when it is not.

Let's save this serie in a variable "bool_serie" for better clarification

In [None]:
bool_serie = df['Age'] > 27

Secondly we  call the df.index method, to extract the indices where the condition is True :

In [None]:
df.index[bool_serie]

Int64Index([2, 3], dtype='int64')

Let's save this indexes in a variable "indices" for better clarification

In [None]:
indices = df.index[bool_serie]

Finally, we pass to iloc the indices where the condition is True

In [None]:
# Select all the columns
df.iloc[indices, :]

Unnamed: 0,Name,Age,City
2,Jim,28,Chicago
3,Joan,29,Houston


In [None]:
# Select the first and last column
df.iloc[indices, [0,-1]]

Unnamed: 0,Name,City
2,Jim,Chicago
3,Joan,Houston


In [None]:
# Select only the second column
df.iloc[indices, [1]]

Unnamed: 0,Age
2,28
3,29
