## Pandas Series

Pandas provides two main structures to handle data: **Series** and **DataFrame**. Here's a comparison to help understand their differences:

---

### **1. Pandas Series**

- A **Series** is a **one-dimensional array**.
- It can hold data of any type (integers, floats, strings, etc.).
- Think of it as a **column in a table** or a **list with labels** (index).

#### **Key Characteristics**
- **One-dimensional**: Contains a single column of data.
- **Indexed**: Each value has an associated index.
- **Homogeneous**: All elements in a Series must have the same data type.

### **Comparison Table**

| **Feature**              | **Pandas Series**                   | **NumPy Array**                      |
|--------------------------|--------------------------------------|--------------------------------------|
| **Dimensionality**       | 1D (One-dimensional)                | 1D, 2D, or nD (Multi-dimensional)   |
| **Indexing**             | Labeled (customizable index)        | Positional (numeric only)           |
| **Data Type**            | Can handle mixed data types         | Must be homogeneous                 |
| **Missing Data Handling**| Supports `NaN`                      | No built-in support                 |
| **Operations**           | High-level operations (filtering, aggregation, etc.) | Optimized for numerical operations |
| **Use Case**             | Data analysis and manipulation      | High-performance numerical computations |



### First Steps with Pandas Series

In [1]:
import pandas as pd

In [2]:
titanic = pd.read_csv("titanic.csv")

In [3]:
titanic

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,deck
0,0,3,male,22.0,1,0,7.2500,S,
1,1,1,female,38.0,1,0,71.2833,C,C
2,1,3,female,26.0,0,0,7.9250,S,
3,1,1,female,35.0,1,0,53.1000,S,C
4,0,3,male,35.0,0,0,8.0500,S,
...,...,...,...,...,...,...,...,...,...
886,0,2,male,27.0,0,0,13.0000,S,
887,1,1,female,19.0,0,0,30.0000,S,B
888,0,3,female,,1,2,23.4500,S,
889,1,1,male,26.0,0,0,30.0000,C,C


In [4]:
titanic.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 9 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   survived  891 non-null    int64  
 1   pclass    891 non-null    int64  
 2   sex       891 non-null    object 
 3   age       714 non-null    float64
 4   sibsp     891 non-null    int64  
 5   parch     891 non-null    int64  
 6   fare      891 non-null    float64
 7   embarked  889 non-null    object 
 8   deck      203 non-null    object 
dtypes: float64(2), int64(4), object(3)
memory usage: 62.8+ KB


In [5]:
titanic["age"]

0      22.0
1      38.0
2      26.0
3      35.0
4      35.0
       ... 
886    27.0
887    19.0
888     NaN
889    26.0
890    32.0
Name: age, Length: 891, dtype: float64

In [6]:
type(titanic["age"])

pandas.core.series.Series

In [7]:
titanic["age"].equals(titanic.age)

True

In [8]:
age = titanic["age"]

In [9]:
age.head(2)

0    22.0
1    38.0
Name: age, dtype: float64

In [10]:
age.tail()

886    27.0
887    19.0
888     NaN
889    26.0
890    32.0
Name: age, dtype: float64

In [11]:
age.dtype

dtype('float64')

In [12]:
age.shape

(891,)

In [13]:
len(age)

891

In [14]:
age.index

RangeIndex(start=0, stop=891, step=1)

In [15]:
age.info()

<class 'pandas.core.series.Series'>
RangeIndex: 891 entries, 0 to 890
Series name: age
Non-Null Count  Dtype  
--------------  -----  
714 non-null    float64
dtypes: float64(1)
memory usage: 7.1 KB


In [16]:
age.to_frame().info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   age     714 non-null    float64
dtypes: float64(1)
memory usage: 7.1 KB


###  Analyzing Numerical Series

In [17]:
age

0      22.0
1      38.0
2      26.0
3      35.0
4      35.0
       ... 
886    27.0
887    19.0
888     NaN
889    26.0
890    32.0
Name: age, Length: 891, dtype: float64

In [18]:
age.describe()

count    714.000000
mean      29.699118
std       14.526497
min        0.420000
25%       20.125000
50%       28.000000
75%       38.000000
max       80.000000
Name: age, dtype: float64

In [19]:
age.count()

np.int64(714)

In [20]:
age.size

891

In [21]:
len(age)

891

In [None]:
age.sum()
# skipna = True

np.float64(21205.17)

In [23]:
sum(age)

nan

In [31]:
age.mean(skipna=True)

np.float64(29.69911764705882)

In [25]:
age.median()

np.float64(28.0)

In [26]:
age.std()

np.float64(14.526497332334042)

In [28]:
age.min()

np.float64(0.42)

In [29]:
age.max()

np.float64(80.0)

In [30]:
age.unique()

array([22.  , 38.  , 26.  , 35.  ,   nan, 54.  ,  2.  , 27.  , 14.  ,
        4.  , 58.  , 20.  , 39.  , 55.  , 31.  , 34.  , 15.  , 28.  ,
        8.  , 19.  , 40.  , 66.  , 42.  , 21.  , 18.  ,  3.  ,  7.  ,
       49.  , 29.  , 65.  , 28.5 ,  5.  , 11.  , 45.  , 17.  , 32.  ,
       16.  , 25.  ,  0.83, 30.  , 33.  , 23.  , 24.  , 46.  , 59.  ,
       71.  , 37.  , 47.  , 14.5 , 70.5 , 32.5 , 12.  ,  9.  , 36.5 ,
       51.  , 55.5 , 40.5 , 44.  ,  1.  , 61.  , 56.  , 50.  , 36.  ,
       45.5 , 20.5 , 62.  , 41.  , 52.  , 63.  , 23.5 ,  0.92, 43.  ,
       60.  , 10.  , 64.  , 13.  , 48.  ,  0.75, 53.  , 57.  , 80.  ,
       70.  , 24.5 ,  6.  ,  0.67, 30.5 ,  0.42, 34.5 , 74.  ])

In [33]:
len(age.unique())

89

In [37]:
# used to find the number of unique values in the age column (or Series), 
# and the parameter dropna=False controls whether NaN (missing) values are considered as a unique value.
age.nunique(dropna=False)

89

In [38]:
age.value_counts()

age
24.00    30
22.00    27
18.00    26
28.00    25
30.00    25
         ..
24.50     1
0.67      1
0.42      1
34.50     1
74.00     1
Name: count, Length: 88, dtype: int64

In [39]:
age.value_counts(sort = True)

age
24.00    30
22.00    27
18.00    26
28.00    25
30.00    25
         ..
24.50     1
0.67      1
0.42      1
34.50     1
74.00     1
Name: count, Length: 88, dtype: int64

In [40]:
age.value_counts(sort = False) 
# count the occurrences of unique values in the age column (or Series) and sort the results in descending order by default 

age
22.00    27
38.00    11
26.00    18
35.00    18
54.00     8
         ..
0.67      1
30.50     2
0.42      1
34.50     1
74.00     1
Name: count, Length: 88, dtype: int64

In [41]:
age.value_counts(dropna = True)

age
24.00    30
22.00    27
18.00    26
28.00    25
30.00    25
         ..
24.50     1
0.67      1
0.42      1
34.50     1
74.00     1
Name: count, Length: 88, dtype: int64

In [42]:
age.value_counts(dropna = False)

age
NaN      177
24.00     30
22.00     27
18.00     26
28.00     25
        ... 
24.50      1
0.67       1
0.42       1
34.50      1
74.00      1
Name: count, Length: 89, dtype: int64

In [43]:
age.value_counts(ascending = False)

age
24.00    30
22.00    27
18.00    26
28.00    25
30.00    25
         ..
24.50     1
0.67      1
0.42      1
34.50     1
74.00     1
Name: count, Length: 88, dtype: int64

In [44]:
age.value_counts(ascending = True)

age
66.0     1
12.0     1
70.5     1
36.5     1
20.5     1
        ..
28.0    25
30.0    25
18.0    26
22.0    27
24.0    30
Name: count, Length: 88, dtype: int64

In [None]:
age.value_counts(sort = True, dropna = True, ascending = False, normalize = False)

In [45]:
age.value_counts(sort = True, dropna = True, ascending = False, normalize = True)
# If normalize=True, it shows proportions instead of counts.
# proportion = count of particular value /total count of all values

age
24.00    0.042017
22.00    0.037815
18.00    0.036415
28.00    0.035014
30.00    0.035014
           ...   
24.50    0.001401
0.67     0.001401
0.42     0.001401
34.50    0.001401
74.00    0.001401
Name: proportion, Length: 88, dtype: float64

In [46]:
30/age.count()

np.float64(0.04201680672268908)

In [None]:
age.value_counts(sort = True, dropna = False, ascending = False, normalize = True)

In [47]:
print(age.size)
print(30/age.size)  # Total number of entries (size): 5

891
0.03367003367003367


In [48]:
age.value_counts(sort = True, dropna = True, ascending= False, normalize = False, bins = 5)
# Using bins to group the age values into ranges (e.g., age groups).


(16.336, 32.252]    346
(32.252, 48.168]    188
(0.339, 16.336]     100
(48.168, 64.084]     69
(64.084, 80.0]       11
Name: count, dtype: int64

In [49]:
age.value_counts(sort = True, dropna = True, ascending= False, normalize = True, bins = 10)

(16.336, 24.294]    0.198653
(24.294, 32.252]    0.189675
(32.252, 40.21]     0.132435
(40.21, 48.168]     0.078563
(0.339, 8.378]      0.060606
(8.378, 16.336]     0.051627
(48.168, 56.126]    0.050505
(56.126, 64.084]    0.026936
(64.084, 72.042]    0.010101
(72.042, 80.0]      0.002245
Name: proportion, dtype: float64

### Analyzing non-numerical Series

In [50]:
import pandas as pd

In [51]:
summer = pd.read_csv("summer.csv")

In [52]:
summer.head()

Unnamed: 0,Year,City,Sport,Discipline,Athlete,Country,Gender,Event,Medal
0,1896,Athens,Aquatics,Swimming,"HAJOS, Alfred",HUN,Men,100M Freestyle,Gold
1,1896,Athens,Aquatics,Swimming,"HERSCHMANN, Otto",AUT,Men,100M Freestyle,Silver
2,1896,Athens,Aquatics,Swimming,"DRIVAS, Dimitrios",GRE,Men,100M Freestyle For Sailors,Bronze
3,1896,Athens,Aquatics,Swimming,"MALOKINIS, Ioannis",GRE,Men,100M Freestyle For Sailors,Gold
4,1896,Athens,Aquatics,Swimming,"CHASAPIS, Spiridon",GRE,Men,100M Freestyle For Sailors,Silver


In [None]:
summer.info()

In [53]:
athlete = summer["Athlete"]

In [54]:
athlete.head()

0         HAJOS, Alfred
1      HERSCHMANN, Otto
2     DRIVAS, Dimitrios
3    MALOKINIS, Ioannis
4    CHASAPIS, Spiridon
Name: Athlete, dtype: object

In [55]:
athlete.tail(5)

31160           JANIKOWSKI, Damian
31161    REZAEI, Ghasem Gholamreza
31162               TOTROV, Rustam
31163            ALEKSANYAN, Artur
31164               LIDBERG, Jimmy
Name: Athlete, dtype: object

In [56]:
type(athlete)

pandas.core.series.Series

In [57]:
athlete.dtype

dtype('O')

In [58]:
athlete.shape

(31165,)

In [59]:
athlete.describe()

count               31165
unique              22762
top       PHELPS, Michael
freq                   22
Name: Athlete, dtype: object

In [60]:
athlete.size

31165

In [61]:
athlete.count()

np.int64(31165)

In [None]:
athlete.min()

In [None]:
athlete.unique()

In [None]:
len(athlete.unique())

In [None]:
athlete.nunique(dropna= False)

In [None]:
athlete.value_counts()

In [None]:
athlete.value_counts(sort = True, ascending=True)

In [None]:
athlete.value_counts(sort = True, ascending=False, normalize = True).head()

### Creating Pandas Series (Part 1)

In [62]:
import pandas as pd

#### from DataFrame

In [63]:
summer = pd.read_csv("summer.csv")

In [64]:
summer.head()

Unnamed: 0,Year,City,Sport,Discipline,Athlete,Country,Gender,Event,Medal
0,1896,Athens,Aquatics,Swimming,"HAJOS, Alfred",HUN,Men,100M Freestyle,Gold
1,1896,Athens,Aquatics,Swimming,"HERSCHMANN, Otto",AUT,Men,100M Freestyle,Silver
2,1896,Athens,Aquatics,Swimming,"DRIVAS, Dimitrios",GRE,Men,100M Freestyle For Sailors,Bronze
3,1896,Athens,Aquatics,Swimming,"MALOKINIS, Ioannis",GRE,Men,100M Freestyle For Sailors,Gold
4,1896,Athens,Aquatics,Swimming,"CHASAPIS, Spiridon",GRE,Men,100M Freestyle For Sailors,Silver


In [65]:
summer["Athlete"]

0                    HAJOS, Alfred
1                 HERSCHMANN, Otto
2                DRIVAS, Dimitrios
3               MALOKINIS, Ioannis
4               CHASAPIS, Spiridon
                   ...            
31160           JANIKOWSKI, Damian
31161    REZAEI, Ghasem Gholamreza
31162               TOTROV, Rustam
31163            ALEKSANYAN, Artur
31164               LIDBERG, Jimmy
Name: Athlete, Length: 31165, dtype: object

In [None]:
summer.Athlete

In [None]:
summer.iloc[0]

#### Importing from CSV

In [None]:
pd.read_csv("summer.csv", usecols = ["Athlete"], squeeze = True) # old

In [66]:
pd.read_csv("summer.csv", usecols = ["Athlete"]).squeeze("columns") # new
# he squeeze() method is used to convert a DataFrame into a Series when there is only one column or one row. It is particularly useful when you are working with a DataFrame that has a single column or a single row and you want to reduce its dimensionality.
# The argument "columns" indicates that pandas should convert the single column into a Series. It will return a pandas Series with the Athlete values as a single column.

0                    HAJOS, Alfred
1                 HERSCHMANN, Otto
2                DRIVAS, Dimitrios
3               MALOKINIS, Ioannis
4               CHASAPIS, Spiridon
                   ...            
31160           JANIKOWSKI, Damian
31161    REZAEI, Ghasem Gholamreza
31162               TOTROV, Rustam
31163            ALEKSANYAN, Artur
31164               LIDBERG, Jimmy
Name: Athlete, Length: 31165, dtype: object

#### Creating from scratch with pd.Series()

In [67]:
pd.Series([10,25,6,36,2])

0    10
1    25
2     6
3    36
4     2
dtype: int64

In [None]:
#pd.Series([10,25,6,36,2], index=["Mon","Tue","Wed","Thu", "Fri", "Sat"])

In [68]:
pd.Series([10,25,6,36,2], index=["Mon","Tue","Wed","Thu", "Fri"], name = "Sales")

Mon    10
Tue    25
Wed     6
Thu    36
Fri     2
Name: Sales, dtype: int64

### Creating Pandas Series (Part 2)

#### from Numpy Array

In [69]:
import pandas as pd
import numpy as np

In [70]:
sales = np.array([10,25,6,36,2])
sales

array([10, 25,  6, 36,  2])

In [71]:
pd.Series(sales)

0    10
1    25
2     6
3    36
4     2
dtype: int64

#### from List

In [72]:
sales = [10,25,6,36,2]

In [73]:
pd.Series(sales)

0    10
1    25
2     6
3    36
4     2
dtype: int64

#### from Dictionary

In [74]:
dic = {"Mon":10, "Tue":25, "Wed":6, "Thu": 36, "Fri": 2}
dic

{'Mon': 10, 'Tue': 25, 'Wed': 6, 'Thu': 36, 'Fri': 2}

In [75]:
sales = pd.Series(dic)

In [76]:
sales

Mon    10
Tue    25
Wed     6
Thu    36
Fri     2
dtype: int64

In [77]:
pd.Series(dic, index = ["Fri", "Sat", "Sun", "Mon", "Tue", "Wed"])

Fri     2.0
Sat     NaN
Sun     NaN
Mon    10.0
Tue    25.0
Wed     6.0
dtype: float64

In [78]:
pd.Series(dic, index = [1,2,3,4,5])

1   NaN
2   NaN
3   NaN
4   NaN
5   NaN
dtype: float64

### Indexing and Slicing

In [81]:
import pandas as pd

In [82]:
titanic = pd.read_csv("titanic.csv")

In [83]:
titanic.head()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,deck
0,0,3,male,22.0,1,0,7.25,S,
1,1,1,female,38.0,1,0,71.2833,C,C
2,1,3,female,26.0,0,0,7.925,S,
3,1,1,female,35.0,1,0,53.1,S,C
4,0,3,male,35.0,0,0,8.05,S,


In [84]:
titanic.tail()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,deck
886,0,2,male,27.0,0,0,13.0,S,
887,1,1,female,19.0,0,0,30.0,S,B
888,0,3,female,,1,2,23.45,S,
889,1,1,male,26.0,0,0,30.0,C,C
890,0,3,male,32.0,0,0,7.75,Q,


In [86]:
age = titanic.age

In [87]:
age.head()

0    22.0
1    38.0
2    26.0
3    35.0
4    35.0
Name: age, dtype: float64

In [88]:
age.tail()

886    27.0
887    19.0
888     NaN
889    26.0
890    32.0
Name: age, dtype: float64

In [89]:
age.index

RangeIndex(start=0, stop=891, step=1)

In [90]:
age[0]

np.float64(22.0)

In [91]:
age[2]

np.float64(26.0)

In [92]:
age.iloc[-1]

np.float64(32.0)

In [93]:
age[890]

np.float64(32.0)

In [94]:
age[[3,4]]

3    35.0
4    35.0
Name: age, dtype: float64

In [95]:
age.loc[:3]

0    22.0
1    38.0
2    26.0
3    35.0
Name: age, dtype: float64

In [98]:
summer = pd.read_csv("summer.csv", index_col = "Athlete")

In [99]:
summer.head()

Unnamed: 0_level_0,Year,City,Sport,Discipline,Country,Gender,Event,Medal
Athlete,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
"HAJOS, Alfred",1896,Athens,Aquatics,Swimming,HUN,Men,100M Freestyle,Gold
"HERSCHMANN, Otto",1896,Athens,Aquatics,Swimming,AUT,Men,100M Freestyle,Silver
"DRIVAS, Dimitrios",1896,Athens,Aquatics,Swimming,GRE,Men,100M Freestyle For Sailors,Bronze
"MALOKINIS, Ioannis",1896,Athens,Aquatics,Swimming,GRE,Men,100M Freestyle For Sailors,Gold
"CHASAPIS, Spiridon",1896,Athens,Aquatics,Swimming,GRE,Men,100M Freestyle For Sailors,Silver


In [100]:
event = summer.Event

In [101]:
event.head()

Athlete
HAJOS, Alfred                     100M Freestyle
HERSCHMANN, Otto                  100M Freestyle
DRIVAS, Dimitrios     100M Freestyle For Sailors
MALOKINIS, Ioannis    100M Freestyle For Sailors
CHASAPIS, Spiridon    100M Freestyle For Sailors
Name: Event, dtype: object

In [None]:
event.tail()

In [102]:
event.index

Index(['HAJOS, Alfred', 'HERSCHMANN, Otto', 'DRIVAS, Dimitrios',
       'MALOKINIS, Ioannis', 'CHASAPIS, Spiridon', 'CHOROPHAS, Efstathios',
       'HAJOS, Alfred', 'ANDREOU, Joannis', 'CHOROPHAS, Efstathios',
       'NEUMANN, Paul',
       ...
       'AHMADOV, Emin', 'KAZAKEVIC, Aleksandr', 'KHUGAEV, Alan',
       'EBRAHIM, Karam Mohamed Gaber', 'GAJIYEV, Danyal', 'JANIKOWSKI, Damian',
       'REZAEI, Ghasem Gholamreza', 'TOTROV, Rustam', 'ALEKSANYAN, Artur',
       'LIDBERG, Jimmy'],
      dtype='object', name='Athlete', length=31165)

In [None]:
event[0] # deprecated and removed in the future

In [None]:
event[1]  # deprecated and removed in the future

In [None]:
event.iloc[-1]

In [None]:
event.iloc[:3]

In [None]:
event["DRIVAS, Dimitrios"]

In [None]:
event[:"DRIVAS, Dimitrios"]

In [None]:
event.loc["PHELPS, Michael"]

In [None]:
event.loc["PHELPS, Michael"].equals(event["PHELPS, Michael"])

In [None]:
#event[:"PHELPS, Michael"]

In [None]:
event.loc[["PHELPS, Michael", "LEWIS, Carl"]]

In [None]:
#event[["PHELPS, Michael", "DUCK, Donald"]]

### Sorting and introduction to the  inplace-parameter

In [103]:
import pandas as pd

In [104]:
dic = {1:10, 3:25, 2:6, 4:36, 5:2, 6:0, 7:None}
dic

{1: 10, 3: 25, 2: 6, 4: 36, 5: 2, 6: 0, 7: None}

In [105]:
sales = pd.Series(dic)
sales

1    10.0
3    25.0
2     6.0
4    36.0
5     2.0
6     0.0
7     NaN
dtype: float64

In [106]:
sales.sort_index()

1    10.0
2     6.0
3    25.0
4    36.0
5     2.0
6     0.0
7     NaN
dtype: float64

In [107]:
sales.sort_index(ascending = True, inplace= True)
# inplace=True modifies the original DataFrame directly, rather than returning a new DataFrame.


In [108]:
sales

1    10.0
2     6.0
3    25.0
4    36.0
5     2.0
6     0.0
7     NaN
dtype: float64

In [109]:
sales.sort_values(inplace=False)

6     0.0
5     2.0
2     6.0
1    10.0
3    25.0
4    36.0
7     NaN
dtype: float64

In [110]:
sales.sort_values(ascending=False, na_position="last", inplace= True)

In [111]:
sales

4    36.0
3    25.0
1    10.0
2     6.0
5     2.0
6     0.0
7     NaN
dtype: float64

In [112]:
dic = {"Mon":10, "Tue":25, "Wed":6, "Thu": 36, "Fri": 2}
dic

{'Mon': 10, 'Tue': 25, 'Wed': 6, 'Thu': 36, 'Fri': 2}

In [113]:
sales = pd.Series(dic)

In [114]:
sales

Mon    10
Tue    25
Wed     6
Thu    36
Fri     2
dtype: int64

In [115]:
sales.sort_index(ascending=False)

Wed     6
Tue    25
Thu    36
Mon    10
Fri     2
dtype: int64

### nlargest() and nsmallest()

In [116]:
import pandas as pd

In [117]:
titanic = pd.read_csv("titanic.csv")

In [118]:
titanic.head()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,deck
0,0,3,male,22.0,1,0,7.25,S,
1,1,1,female,38.0,1,0,71.2833,C,C
2,1,3,female,26.0,0,0,7.925,S,
3,1,1,female,35.0,1,0,53.1,S,C
4,0,3,male,35.0,0,0,8.05,S,


In [124]:
age = titanic.age

In [125]:
age.head()

0    22.0
1    38.0
2    26.0
3    35.0
4    35.0
Name: age, dtype: float64

In [121]:
age.sort_values(ascending=False).head(3)

630    80.0
851    74.0
493    71.0
Name: age, dtype: float64

In [122]:
age.sort_values(ascending=True).iloc[:3]

803    0.42
755    0.67
644    0.75
Name: age, dtype: float64

In [127]:
age.nlargest(n=3).index[0]
# find the index of the largest value in the age Series (or column), 
# specifically the first largest value when selecting the top n values , n is 3 here 
#  top 3 values

np.int64(630)

In [128]:
age.nlargest(n=3)

630    80.0
851    74.0
96     71.0
Name: age, dtype: float64

In [129]:
age.nsmallest(n = 3).index[0]

np.int64(803)

In [130]:
age.nsmallest(n=3)

803    0.42
755    0.67
469    0.75
Name: age, dtype: float64

### idxmin() and idxmax()

In [131]:
titanic.head()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,deck
0,0,3,male,22.0,1,0,7.25,S,
1,1,1,female,38.0,1,0,71.2833,C,C
2,1,3,female,26.0,0,0,7.925,S,
3,1,1,female,35.0,1,0,53.1,S,C
4,0,3,male,35.0,0,0,8.05,S,


In [132]:
titanic.age.idxmax() 
# returns the index label of the maximum value in a Series (or column of a DataFrame).

630

In [133]:
titanic.age.idxmin()

803

In [134]:
titanic.loc[630]

survived       1
pclass         1
sex         male
age         80.0
sibsp          0
parch          0
fare        30.0
embarked       S
deck           A
Name: 630, dtype: object

In [135]:
titanic.loc[titanic.age.idxmin()]

survived         1
pclass           3
sex           male
age           0.42
sibsp            0
parch            1
fare        8.5167
embarked         C
deck           NaN
Name: 803, dtype: object

In [154]:
dic = {"Mon":10,"Tue":25, "Wed":6, "Thu":36, "Fri":2, "Sat":0, "Sun":None,'Tat':0 }
# dic

In [155]:
sales = pd.Series(dic)
sales

Mon    10.0
Tue    25.0
Wed     6.0
Thu    36.0
Fri     2.0
Sat     0.0
Sun     NaN
Tat     0.0
dtype: float64

In [156]:
sales.sort_values(ascending=True).index[0]

'Tat'

In [157]:
sales.idxmin()

'Sat'

In [153]:
sales.sort_values(ascending=False)

Thu    36.0
Tue    25.0
Mon    10.0
Wed     6.0
Fri     2.0
Sat     0.0
Tat     0.0
Sun     NaN
dtype: float64

In [158]:
sales.idxmax()

'Thu'

### Manipulating Series

In [159]:
import pandas as pd

In [160]:
sales = pd.Series([10,25,6,36,2,0,None,5], index = ["Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun", "Mon"])
sales

Mon    10.0
Tue    25.0
Wed     6.0
Thu    36.0
Fri     2.0
Sat     0.0
Sun     NaN
Mon     5.0
dtype: float64

In [161]:
sales["Sun"] = 0

In [162]:
sales

Mon    10.0
Tue    25.0
Wed     6.0
Thu    36.0
Fri     2.0
Sat     0.0
Sun     0.0
Mon     5.0
dtype: float64

In [163]:
sales.iloc[3] = 30

In [None]:
sales

In [164]:
(sales/1.1).round(2)

Mon     9.09
Tue    22.73
Wed     5.45
Thu    27.27
Fri     1.82
Sat     0.00
Sun     0.00
Mon     4.55
dtype: float64

In [165]:
sales_EUR = (sales/1.1).round(2)
sales_EUR

Mon     9.09
Tue    22.73
Wed     5.45
Thu    27.27
Fri     1.82
Sat     0.00
Sun     0.00
Mon     4.55
dtype: float64

In [None]:
sales = (sales/1.1).round(2)

In [None]:
sales

In [None]:
sales["Mon"] = 0

In [None]:
sales

In [167]:
titanic = pd.read_csv("titanic.csv")

In [168]:
titanic.head()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,deck
0,0,3,male,22.0,1,0,7.25,S,
1,1,1,female,38.0,1,0,71.2833,C,C
2,1,3,female,26.0,0,0,7.925,S,
3,1,1,female,35.0,1,0,53.1,S,C
4,0,3,male,35.0,0,0,8.05,S,


In [169]:
age = titanic["age"]

In [170]:
age.head()

0    22.0
1    38.0
2    26.0
3    35.0
4    35.0
Name: age, dtype: float64

In [171]:
age.tail()

886    27.0
887    19.0
888     NaN
889    26.0
890    32.0
Name: age, dtype: float64

In [172]:
age.iloc[1] = 30 

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  age.iloc[1] = 30


In [173]:
age.head()

0    22.0
1    30.0
2    26.0
3    35.0
4    35.0
Name: age, dtype: float64

In [None]:
titanic.head()