<a href="https://colab.research.google.com/github/davidofitaly/notes_03_python_in_data_analysis/blob/main/02_working_with_the_pandas_library.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import pandas as pd
import numpy as np

##Introduction to the data structures of the pandas library

### **`Series` in Pandas**


#####A `Series` is a one-dimensional labeled array capable of holding any data type (integers, floats, strings, or objects). It is similar to a NumPy array but includes an index, allowing for more flexible data manipulation.

#### **Key Features**
- Stores data along with an index.
- Supports various data types (int, float, string, etc.).
- Allows for easy indexing and slicing.
- Supports vectorized operations similar to NumPy arrays.

#### **Creating a `Series`**
A `Series` can be created from lists, NumPy arrays, or dictionaries.

####Examples 2.1



*   object_1



In [2]:
object_1 = pd.Series([2, 2, 7, 4, 0])  # Creates a Pandas Series from a list of integers
print(object_1) # Displays the created Series with default integer index

0    2
1    2
2    7
3    4
4    0
dtype: int64


In [3]:
object_1.array # Returns the underlying array of the Pandas Series

<NumpyExtensionArray>
[2, 2, 7, 4, 0]
Length: 5, dtype: int64

In [4]:
object_1.index # Returns the index of the Pandas Series

RangeIndex(start=0, stop=5, step=1)

In [5]:
object_1[object_1 > 2] # Filters and returns elements from the Series where values are greater than 2

Unnamed: 0,0
2,7
3,4


In [6]:
object_1 * 2 # Multiplies each element in the Series by 2

Unnamed: 0,0
0,4
1,4
2,14
3,8
4,0


In [7]:
'c' in object_1 # Checks if the character 'c' is present in the index of the Series

False



*   object_2



In [8]:
object_2 = pd.Series([2,5,7,1], index=['a', 'b', 'c', 'd']) # Creates a Pandas Series with custom indices
print(object_2) # Displays the Series with assigned index labels

a    2
b    5
c    7
d    1
dtype: int64


In [9]:
object_2.array # Returns the underlying array of the Pandas Series

<NumpyExtensionArray>
[2, 5, 7, 1]
Length: 4, dtype: int64

In [10]:
object_2.index # Returns the index of the Pandas Series

Index(['a', 'b', 'c', 'd'], dtype='object')

In [11]:
object_2['c'] # Accesses the value associated with the index 'c'

7

In [12]:
object_2[['c', 'd']] # Selects and displays elements from the Series with indices 'c' and 'd'

Unnamed: 0,0
c,7
d,1





*   object_3





In [13]:
food_data = {'Kebab': 1000, 'Pizza': 750, 'Pasta': 475} # Creates a dictionary with food items as keys and their values

object_3 = pd.Series(food_data) # Converts the dictionary into a pandas Series
print(object_3) # Displays the Series with labeled indices

Kebab    1000
Pizza     750
Pasta     475
dtype: int64


In [14]:
object_3.to_dict() # Converts the Series back to a dictionary

{'Kebab': 1000, 'Pizza': 750, 'Pasta': 475}



*   object_4


In [15]:
food_list = ['Tuna', 'Burger', 'Kebab']

object_4 = pd.Series(food_data, index=food_list) # Creates a Series from a dictionary with custom indices
print(object_4) # Displays the Series with labeled indices

Tuna         NaN
Burger       NaN
Kebab     1000.0
dtype: float64


In [16]:
pd.isnull(object_4) # Checks for null values in the Series and returns a boolean mask

Unnamed: 0,0
Tuna,True
Burger,True
Kebab,False


In [17]:
pd.notnull(object_4) # Checks for non-null values in the Series and returns a boolean mask

Unnamed: 0,0
Tuna,False
Burger,False
Kebab,True


In [18]:
object_4.name = 'food' # Sets the name of the Series to 'food'
object_4.index.name = 'food_name' # Sets the name of the index to 'food_name'
print(object_4) # Displays the Series with labeled indices

food_name
Tuna         NaN
Burger       NaN
Kebab     1000.0
Name: food, dtype: float64


###DataFrame in pandas

##### DataFrame in pandas

- A `DataFrame` is a two-dimensional, size-mutable, and heterogeneous data structure in pandas.
- It consists of rows and columns, similar to a table or spreadsheet.
- Each column in a `DataFrame` is a `Series`, and columns can have different data types.
- It supports operations like indexing, filtering, and statistical computations.

**Key Features:**
- Labeled axes (`index` for rows, `columns` for column names).
- Supports missing data handling.
- Can be created from dictionaries, lists, NumPy arrays, or external data sources (CSV, SQL, etc.).


####Examples 2.2



*   frame_1


In [19]:
data_dict = {
    'state': ['California', 'Texas', 'New York', 'Florida', 'Illinois', 'Ohio'], # List of state names
    'year': [2020, 2021, 2022, 2023, 2024, 2025], # List of corresponding years
    'pop': [39538223, 29145505, 20201249, 21538187, 12812508, 11799448] # List of population values
}

frame_1 = pd.DataFrame(data_dict) # Creates a DataFrame from a dictionary
print(frame_1) # Displays the DataFrame

        state  year       pop
0  California  2020  39538223
1       Texas  2021  29145505
2    New York  2022  20201249
3     Florida  2023  21538187
4    Illinois  2024  12812508
5        Ohio  2025  11799448


In [20]:
frame_1.head() # Returns the first 5 rows of the DataFrame

Unnamed: 0,state,year,pop
0,California,2020,39538223
1,Texas,2021,29145505
2,New York,2022,20201249
3,Florida,2023,21538187
4,Illinois,2024,12812508


In [21]:
frame_1.tail() # Returns the last 5 rows of the DataFrame

Unnamed: 0,state,year,pop
1,Texas,2021,29145505
2,New York,2022,20201249
3,Florida,2023,21538187
4,Illinois,2024,12812508
5,Ohio,2025,11799448




*   frame_2



In [22]:
frame_2 = pd.DataFrame(data_dict, columns=['year', 'state', 'pop']) # Creates a DataFrame with specified column order
print(frame_2) # Displays the DataFrame with specified column order

   year       state       pop
0  2020  California  39538223
1  2021       Texas  29145505
2  2022    New York  20201249
3  2023     Florida  21538187
4  2024    Illinois  12812508
5  2025        Ohio  11799448


In [23]:
frame_2 = pd.DataFrame(data_dict, columns=['year', 'state', 'pop', 'debt'], index=['one', 'two', 'three', 'four', 'five', 'six']) # Creates a DataFrame with specified column order and index
print(frame_2) # Displays the DataFrame with specified column order and index
#

       year       state       pop debt
one    2020  California  39538223  NaN
two    2021       Texas  29145505  NaN
three  2022    New York  20201249  NaN
four   2023     Florida  21538187  NaN
five   2024    Illinois  12812508  NaN
six    2025        Ohio  11799448  NaN


In [24]:
frame_2['state'] # Accesses the 'state' column of the DataFrame

Unnamed: 0,state
one,California
two,Texas
three,New York
four,Florida
five,Illinois
six,Ohio


In [25]:
frame_2.year # Accesses the 'year' column of the DataFrame using dot notation

Unnamed: 0,year
one,2020
two,2021
three,2022
four,2023
five,2024
six,2025


In [26]:
frame_2['debt'] = 10 # Updating the 'debt' column in the DataFrame 'frame_2' where it previously had NaN values, now assigning it a constant value of 10
print(frame_2) # Displaying the updated DataFrame with 'debt' column now containing the value 10

       year       state       pop  debt
one    2020  California  39538223    10
two    2021       Texas  29145505    10
three  2022    New York  20201249    10
four   2023     Florida  21538187    10
five   2024    Illinois  12812508    10
six    2025        Ohio  11799448    10


In [27]:
frame_2['debt'] = np.arange(6) # Assigning a sequence of numbers (from 0 to 5) to the 'debt' column in 'frame_2'
print(frame_2) # Displaying the DataFrame with the updated 'debt' column containing values from 0 to 5

       year       state       pop  debt
one    2020  California  39538223     0
two    2021       Texas  29145505     1
three  2022    New York  20201249     2
four   2023     Florida  21538187     3
five   2024    Illinois  12812508     4
six    2025        Ohio  11799448     5


In [28]:
frame_2['eastern'] = frame_2.state == 'Texas' # Creating a new column 'eastern' in 'frame_2', where the value is True if 'state' is 'Texas', otherwise False
print(frame_2) # Displaying the updated DataFrame with the new 'eastern' column

       year       state       pop  debt  eastern
one    2020  California  39538223     0    False
two    2021       Texas  29145505     1     True
three  2022    New York  20201249     2    False
four   2023     Florida  21538187     3    False
five   2024    Illinois  12812508     4    False
six    2025        Ohio  11799448     5    False


In [29]:
del frame_2['eastern'] # Deleting the 'eastern' column from the DataFrame 'frame_2'
print(frame_2) # Displaying the DataFrame after deleting the 'eastern' column

       year       state       pop  debt
one    2020  California  39538223     0
two    2021       Texas  29145505     1
three  2022    New York  20201249     2
four   2023     Florida  21538187     3
five   2024    Illinois  12812508     4
six    2025        Ohio  11799448     5




*   frame_3


In [30]:
# Creating a new dictionary with states as keys, and dictionaries as values for each state
state_info = {
    'Texas': {'year': 2000, 'pop': 25000000, 'debt': 10, 'eastern': True},
    'California': {'year': 2005, 'pop': 37000000, 'debt': 15, 'eastern': False}
}

print(state_info)  # Displaying the dictionary

{'Texas': {'year': 2000, 'pop': 25000000, 'debt': 10, 'eastern': True}, 'California': {'year': 2005, 'pop': 37000000, 'debt': 15, 'eastern': False}}


In [31]:
frame_3 = pd.DataFrame(state_info) # Creating a DataFrame from the dictionary 'state_info'
print(frame_3) # Displaying the DataFrame

            Texas California
year         2000       2005
pop      25000000   37000000
debt           10         15
eastern      True      False


In [32]:
frame_3.T # The .T attribute transposes the DataFrame, swapping rows with columns

Unnamed: 0,year,pop,debt,eastern
Texas,2000,25000000,10,True
California,2005,37000000,15,False


In [33]:
frame_3.index.name = 'year' # Setting the name of the index to 'year'
frame_3.columns.name = 'state' # Setting the name of the columns to 'state'
print(frame_3) # Displaying the DataFrame with index and column names

state       Texas California
year                        
year         2000       2005
pop      25000000   37000000
debt           10         15
eastern      True      False


In [34]:
frame_3.to_numpy() # Converting the DataFrame to a NumPy array

array([[2000, 2005],
       [25000000, 37000000],
       [10, 15],
       [True, False]], dtype=object)

###Updating the index

- **Reindexing** is the process of changing the row or column labels of a DataFrame or Series.
- The `.reindex()` method is used to achieve this.
- You can provide a new index, and the DataFrame will align its data according to the new index. If some labels are missing, the corresponding rows/columns will have `NaN` values.

#### Key Points:
- **For DataFrame**: Rows and columns can be reindexed.
- **For Series**: Only the index can be reindexed.

####Examples 2.4

In [35]:
import pandas as pd

data = {'A': [1, 2, 3], 'B': [4, 5, 6]}
df = pd.DataFrame(data, index=['a', 'b', 'c'])

print(df)

   A  B
a  1  4
b  2  5
c  3  6


In [36]:
# Reindex the rows
df_reindexed = df.reindex(['c', 'b', 'a'])
print(df_reindexed)

   A  B
c  3  6
b  2  5
a  1  4


In [37]:
# Reindex columns
df_reindexed_columns = df.reindex(columns=['B', 'A'])
print(df_reindexed_columns)

   B  A
a  4  1
b  5  2
c  6  3


### Dropping Elements Along an Axis in Pandas

- The `.drop()` function in Pandas allows you to remove rows or columns from a DataFrame or Series.
- You can specify the axis along which to drop the elements:
  - **Axis 0**: Removes rows.
  - **Axis 1**: Removes columns.
  
- The `.drop()` function returns a new DataFrame or Series with the specified elements removed, leaving the original DataFrame unaltered unless `inplace=True` is specified.

####Examples 2.5

In [38]:
data_dict = {
    'state': ['California', 'Texas', 'New York', 'Florida', 'Illinois', 'Ohio'], # List of state names
    'year': [2020, 2021, 2022, 2023, 2024, 2025], # List of corresponding years
    'pop': [39538223, 29145505, 20201249, 21538187, 12812508, 11799448] # List of population values
}

new_object = pd.DataFrame(data_dict) # Creates a DataFrame from a dictionary
print(new_object) # Displays the DataFrame

        state  year       pop
0  California  2020  39538223
1       Texas  2021  29145505
2    New York  2022  20201249
3     Florida  2023  21538187
4    Illinois  2024  12812508
5        Ohio  2025  11799448


In [39]:
new_object.drop([0, 1]) # Drops rows with indices 0 and 1 from the DataFrame

Unnamed: 0,state,year,pop
2,New York,2022,20201249
3,Florida,2023,21538187
4,Illinois,2024,12812508
5,Ohio,2025,11799448


In [40]:
new_object.drop(columns=['year', 'pop']) # Drops the 'year' and 'pop' columns from the DataFrame

Unnamed: 0,state
0,California
1,Texas
2,New York
3,Florida
4,Illinois
5,Ohio


In [41]:
new_object.drop(index=[0, 1], columns=['year', 'pop']) # Drops rows with indices 0 and 1 and columns 'year' and 'pop' from the DataFrame

Unnamed: 0,state
2,New York
3,Florida
4,Illinois
5,Ohio


In [42]:
# Drops the 'state' column from the DataFrame 'new_object'
# axis=1 indicates that the operation is applied to columns (not rows)
new_object.drop('state', axis=1)

Unnamed: 0,year,pop
0,2020,39538223
1,2021,29145505
2,2022,20201249
3,2023,21538187
4,2024,12812508
5,2025,11799448


In [43]:
# Drops the 'year' column from the DataFrame 'new_object'
# 'axis="columns"' is another way to specify that the operation is applied to columns
new_object.drop('year', axis='columns')

Unnamed: 0,state,pop
0,California,39538223
1,Texas,29145505
2,New York,20201249
3,Florida,21538187
4,Illinois,12812508
5,Ohio,11799448


### Indexing, Selection, and Filtering in Pandas


- **Indexing**: Refers to selecting specific rows or columns from a DataFrame or Series. It can be done using labels (e.g., `df['column_name']`) or integer positions (e.g., `df.iloc[0]`).
- **Selection**: Refers to accessing specific data points within a DataFrame or Series based on criteria. You can select columns using `df['column']` or use conditions like `df[df['column'] > 5]` to filter rows.
- **Filtering**: Used to retrieve a subset of data based on specific conditions or criteria. Filters can be applied using conditions such as `df[df['column'] == 'value']`, or using `.loc[]` or `.iloc[]` to select specific rows/columns based on the index or position.

These operations allow for powerful data manipulation and are essential in data analysis.


####Examples 2.6



*   series



In [44]:
# Create a Pandas Series with values from 0 to 3 and custom index labels 'a', 'b', 'c', 'd'

obj = pd.Series(np.arange(4), index=['a', 'b', 'c', 'd'])

In [45]:
obj['b']

1

In [46]:
obj[1:3]

Unnamed: 0,0
b,1
c,2


In [47]:
obj[['a', 'd']]

Unnamed: 0,0
a,0
d,3


In [48]:
obj[[0,3]]

  obj[[0,3]]


Unnamed: 0,0
a,0
d,3


In [49]:
obj[obj <2]

Unnamed: 0,0
a,0
b,1




*  dataframe



In [50]:
data_dict = {
    'state': ['California', 'Texas', 'New York', 'Florida', 'Illinois', 'Ohio'], # List of state names
    'year': [2020, 2021, 2022, 2023, 2024, 2025], # List of corresponding years
    'pop': [39538223, 29145505, 20201249, 21538187, 12812508, 11799448] # List of population values
}

new_object = pd.DataFrame(data_dict) # Creates a DataFrame from a dictionary
print(new_object) # Displays the DataFrame

        state  year       pop
0  California  2020  39538223
1       Texas  2021  29145505
2    New York  2022  20201249
3     Florida  2023  21538187
4    Illinois  2024  12812508
5        Ohio  2025  11799448


In [51]:
new_object['year']

Unnamed: 0,year
0,2020
1,2021
2,2022
3,2023
4,2024
5,2025


In [52]:
new_object[['pop', 'state']]

Unnamed: 0,pop,state
0,39538223,California
1,29145505,Texas
2,20201249,New York
3,21538187,Florida
4,12812508,Illinois
5,11799448,Ohio


In [53]:
new_object[new_object['pop'] > 20000000] # Filter the rows in the DataFrame where the 'pop' column value is greater than 20,000,000.

Unnamed: 0,state,year,pop
0,California,2020,39538223
1,Texas,2021,29145505
2,New York,2022,20201249
3,Florida,2023,21538187


In [54]:
new_object[new_object['state'] == 'California']

Unnamed: 0,state,year,pop
0,California,2020,39538223


In [55]:
new_object[new_object['pop'] >= 20201249]

Unnamed: 0,state,year,pop
0,California,2020,39538223
1,Texas,2021,29145505
2,New York,2022,20201249
3,Florida,2023,21538187


### `loc` and `iloc` in Pandas



- `loc` is used for **label-based indexing**. It allows you to select rows and columns by their labels (names).
- `iloc` is used for **integer-location based indexing**. It allows you to select rows and columns by their integer position.

##### Key Differences:
- `loc`: Includes both the start and end points in slicing.
- `iloc`: Excludes the end point in slicing (like Python's normal slicing behavior).

####Examples 2.7

In [56]:
data_dict = {
    'year': [2020, 2021, 2022, 2023, 2024, 2025], # List of corresponding years
    'pop': [39538223, 29145505, 20201249, 21538187, 12812508, 11799448] # List of population values
}

data = pd.DataFrame(data_dict, index= ['California', 'Texas', 'New York', 'Florida', 'Illinois', 'Ohio']) # Creates a DataFrame from a dictionary
print(data) # Displays the DataFrame

            year       pop
California  2020  39538223
Texas       2021  29145505
New York    2022  20201249
Florida     2023  21538187
Illinois    2024  12812508
Ohio        2025  11799448




*   loc



In [57]:
data.loc["Ohio"]

Unnamed: 0,Ohio
year,2025
pop,11799448


In [58]:
data.loc[['Texas', 'New York']]

Unnamed: 0,year,pop
Texas,2021,29145505
New York,2022,20201249


In [59]:
data.loc['California', ['year']]

Unnamed: 0,California
year,2020


In [60]:
data.loc[data.year > 2022]

Unnamed: 0,year,pop
Florida,2023,21538187
Illinois,2024,12812508
Ohio,2025,11799448




*   iloc


In [61]:
data.iloc[2]

Unnamed: 0,New York
year,2022
pop,20201249


In [62]:
data.iloc[[2,1, -1]]

Unnamed: 0,year,pop
New York,2022,20201249
Texas,2021,29145505
Ohio,2025,11799448


In [63]:
data.iloc[[1,2], [-2]]

Unnamed: 0,year
Texas,2021
New York,2022


In [64]:
data.iloc[:4][data.year > 2022]

  data.iloc[:4][data.year > 2022]


Unnamed: 0,year,pop
Florida,2023,21538187


### `map` and `apply` in Pandas



- **`map()`**: Used for element-wise transformations in a `Series`. It applies a function to each value in the Series.
  - Example: `series.map(lambda x: x * 2)`

- **`apply()`**: Used for applying a function along rows (`axis=1`) or columns (`axis=0`) in a `DataFrame`.
  - Example: `df.apply(np.sum, axis=0)` (sum of each column)

####Examples 2.8

In [66]:
frame = pd.DataFrame(np.random.randn(4,3),
                     columns = list('abc'),
                     index= ['Ohio', 'Colorado', 'Utah', 'New York'])
print(frame) # Displaying the DataFrame

                 a         b         c
Ohio     -0.605644 -0.665137 -0.759131
Colorado -0.322191  2.266001  1.939049
Utah      1.208343 -1.328859 -1.023770
New York -0.569145  0.140427  0.009611


In [67]:
np.abs(frame) # Applying the absolute value function to each element in the DataFrame

Unnamed: 0,a,b,c
Ohio,0.605644,0.665137,0.759131
Colorado,0.322191,2.266001,1.939049
Utah,1.208343,1.328859,1.02377
New York,0.569145,0.140427,0.009611


In [68]:
def fl1(x):
    return x.max() + x.min()

frame.apply(fl1) # Applying the 'fl1' function to each column in the DataFrame

Unnamed: 0,0
a,0.602699
b,0.937142
c,0.915278


In [69]:
frame.apply(fl1, axis='columns')

Unnamed: 0,0
Ohio,-1.364775
Colorado,1.94381
Utah,-0.120516
New York,-0.428718


In [72]:
frame.map(lambda x: x**2)

Unnamed: 0,a,b,c
Ohio,0.366804,0.442407,0.57628
Colorado,0.103807,5.134761,3.759911
Utah,1.460093,1.765867,1.048106
New York,0.323926,0.01972,9.2e-05


In [71]:
def my_format(x):
    return f"{x:.3f}"

frame.applymap(my_format) # Applying the 'my_format' function to each element in the DataFrame

  frame.applymap(my_format) # Applying the 'my_format' function to each element in the DataFrame


Unnamed: 0,a,b,c
Ohio,-0.606,-0.665,-0.759
Colorado,-0.322,2.266,1.939
Utah,1.208,-1.329,-1.024
New York,-0.569,0.14,0.01


### Sorting in Pandas: `sort_index()` and `sort_values()`

#####Pandas provides two main sorting methods:  

- **`sort_index()`** – Sorts a DataFrame or Series based on the index.
- **`sort_values()`** – Sorts a DataFrame or Series based on column or row values.

### `sort_index()`
Sorts data by index labels.  
- `ascending=True` (default) – sorts in ascending order.  
- `ascending=False` – sorts in descending order.  
- `axis=0` (default) – sorts rows, `axis=1` – sorts columns.  

### `sort_values()`
Sorts data by specific column values.  
- `by` – column(s) used for sorting.  
- `ascending=True` (default) – ascending order, `False` for descending.  
- `na_position='last'` (default) – places `NaN` at the end, `'first'` moves them to the beginning.  

####Examples 2.9

In [82]:
# Creating a sample DataFrame
df = pd.DataFrame({
    'A': [3, 1, 2, 5, 4],
    'B': [9, 7, 8, 6, 5]
}, index=['d', 'b', 'c', 'a', 'e'])

df

Unnamed: 0,A,B
d,3,9
b,1,7
c,2,8
a,5,6
e,4,5




*   sort_index



In [78]:
# Sorting by index in ascending order (default)
df.sort_index()

Unnamed: 0,A,B
a,5,6
b,1,7
c,2,8
d,3,9
e,4,5


In [79]:
# Sorting by index in descending order
df.sort_index(ascending=False)

Unnamed: 0,A,B
e,4,5
d,3,9
c,2,8
b,1,7
a,5,6


In [81]:
# Sorting columns (axis=1) based on column names
df.sort_index(axis=1)

Unnamed: 0,A,B
d,3,9
b,1,7
c,2,8
a,5,6
e,4,5




*  sort_values



In [84]:
# Sorting by column 'A' in ascending order (default)
df.sort_values(by='A')

Unnamed: 0,A,B
b,1,7
c,2,8
d,3,9
e,4,5
a,5,6


In [86]:
# Sorting by column 'B' in descending order
df.sort_values(by='B', ascending=False)

Unnamed: 0,A,B
d,3,9
c,2,8
b,1,7
a,5,6
e,4,5


In [89]:
# Sorting by column 'A' and handling NaN values first
df_with_nan = pd.DataFrame({'A': [3, np.nan, 2, 1, np.nan]})
df_with_nan.sort_values(by= 'A', na_position='first')

Unnamed: 0,A
1,
4,
3,1.0
2,2.0
0,3.0


### Pandas `rank()` Function


#####The `rank()` function in Pandas is used to assign ranks to values in a DataFrame or Series. This is useful when you need to create rankings based on values within the data. It offers various ranking methods and options to handle ties and `NaN` values.

##### Parameters:

- **`axis`**:
  - `0` or `'index'` (default) – Ranks along the rows (i.e., ranking each column's values).
  - `1` or `'columns'` – Ranks along the columns (i.e., ranking each row's values).

- **`method`**: Specifies how to rank tied values.
  - **`'average'`** (default) – Tied values receive the average rank.
  - **`'min'`** – Tied values receive the minimum rank.
  - **`'max'`** – Tied values receive the maximum rank.
  - **`'first'`** – Tied values receive ranks in the order they appear.
  - **`'dense'`** – Tied values receive the smallest rank, with no gaps in rank order.


####Examples 2.10


In [91]:
data = {'Score': [87, 92, 92, 75, 60]}

df = pd.DataFrame(data)
df

Unnamed: 0,Score
0,87
1,92
2,92
3,75
4,60


In [95]:
df.rank() # This code applies the rank() function to the DataFrame `df`

Unnamed: 0,Score
0,3.0
1,4.5
2,4.5
3,2.0
4,1.0


In [94]:
 df.rank(method='min', ascending=False) # This code applies the rank() function to the DataFrame `df`

Unnamed: 0,Score
0,3.0
1,1.0
2,1.0
3,4.0
4,5.0


In [96]:
 df.rank(method='first') # This code applies the rank() function to the DataFrame `df`

Unnamed: 0,Score
0,3.0
1,4.0
2,5.0
3,2.0
4,1.0
