# Pandas

Pandas is a powerful and versatile library that simplifies tasks of data manipulation in Python . Pandas is built on top of the NumPy library and is particularly well-suited for working with tabular data, such as spreadsheets or SQL tables. Its versatility and ease of use make it an essential tool for data analysts, scientists, and engineers working with structured data in Python.

# What can you do using Pandas?
Pandas are generally used for data science but have you wondered why? This is because pandas are used in conjunction with other libraries that are used for data science. It is built on the top of the NumPy library which means that a lot of structures of NumPy are used or replicated in Pandas. The data produced by Pandas are often used as input for plotting functions of Matplotlib, statistical analysis in SciPy, and machine learning algorithms in Scikit-learn. Here is a list of things that we can do using Pandas.

# Before you start

You should get along with [Numpy](Numpy.ipynb) to understand some terms like `axis`, `rank`, and `label`.



## Getting Started with Pandas

### Installing Pandas

- `pip install pandas`


### Import pandas

In [40]:
import pandas as pd

# Pandas Data Structures

Pandas generally provide two data structures for manipulating data, They are: 

* Series
* DataFrame


## Series

A Pandas Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.). The axis labels are collectively called indexes.

Pandas Series is simply a column in an Excel sheet. Labels need not be unique but must be a hashable type. This object supports both integer and label-based indexing, and provides some methods for performing operations related to the index.




In [41]:
# import pandas as pd
import pandas as pd
 
# simple array
data = [1, 2, 3, 4]
 
ser = pd.Series(data)
print(ser)

0    1
1    2
2    3
3    4
dtype: int64


![](https://media.geeksforgeeks.org/wp-content/uploads/dataSER-1.png)

## DataFrame

Pandas DataFrame is a two-dimensional data structure with labeled axes (rows and columns). The data is aligned in a tabular fashion in rows and columns. Pandas DataFrame consists of three principal components: the data, rows, and columns.

![](https://3954911119-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LbBOSivlH5hwcpk3QX6%2F-Lbf5w7csok_uXkzPn8W%2F-Lbf6RFZAvH57OTSeKy7%2F4410.png?alt=media&token=f356576f-606a-4e32-86e4-fe73abc86fe9)
    

In [42]:
df = pd.DataFrame({"Name": ["Braund Harris",
                            "Allen Henry",
                            "Bonnell Elizabeth",
                            ],
                   "Age": [22, 35, 58],
                   "Gender": ["male", "male", "female"],
                   })
print(df)

                Name  Age  Gender
0      Braund Harris   22    male
1        Allen Henry   35    male
2  Bonnell Elizabeth   58  female


In the real world, a DataFrame and Series will be created by loading the datasets from existing storage, storage can be SQL Database, CSV file, and Excel file. Pandas DataFrame can be created from the lists, dictionary, and from a list of dictionary etc. Dataframe can be created in different ways here are some ways by which we create a dataframe: Creating a dataframe using List: DataFrame can be created using a single list or a list of lists.

In [43]:
df.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Name    3 non-null      object
 1   Age     3 non-null      int64 
 2   Gender  3 non-null      object
dtypes: int64(1), object(2)
memory usage: 200.0+ bytes


In [44]:
df.head(1)
df.tail(2)

Unnamed: 0,Name,Age,Gender
1,Allen Henry,35,male
2,Bonnell Elizabeth,58,female


In [45]:
ages = df['Age']
ages

type(ages)

pandas.core.series.Series

## Attributes of the Dataframe/Series

Every Dataframe/Series is an object with the following attributes

### `shape` - The shape of the Series/Dataframe

In [46]:
df.shape

(3, 3)

### `columns`  - columns of the DataFrame

In [47]:
df.columns

Index(['Name', 'Age', 'Gender'], dtype='object')

Every column in the data frame has their own index. We can get its name by mapping the `index` to syntax:
`df.columns[index]`

In [48]:
df.columns[0] # first column

'Name'

In [49]:
df.columns[1] # second column

'Age'

In [50]:
df.columns[-1] # last column

'Gender'

### `dtypes` - the type of the columns in DataFrame 

In [51]:
df.dtypes 

Name      object
Age        int64
Gender    object
dtype: object

### `values` - return numpy array containing all values of Series/Dataframe

In [79]:
df.values

array([[1, 0, 3, ..., 'S', 8.7, 'male'],
       [2, 1, 1, ..., 'C', 85.53996, 'female'],
       [3, 1, 3, ..., 'S', 9.51, 'female'],
       ...,
       [889, 0, 3, ..., 'S', 28.139999999999997, 'female'],
       [890, 1, 1, ..., 'C', 36.0, 'male'],
       [891, 0, 3, ..., 'Q', 9.299999999999999, 'male']], dtype=object)

In [None]:
d

## Methods in pandas

### `read_csv()` - read a data source (Ex: Excel, web,...)

In [52]:
df = pd.read_csv('https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv')


In [53]:
# df = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/ML/Datasets/Titanic.csv')

### `info()` - Display details of dataframe

In [54]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   PassengerId  891 non-null    int64  
 1   Survived     891 non-null    int64  
 2   Pclass       891 non-null    int64  
 3   Name         891 non-null    object 
 4   Sex          891 non-null    object 
 5   Age          714 non-null    float64
 6   SibSp        891 non-null    int64  
 7   Parch        891 non-null    int64  
 8   Ticket       891 non-null    object 
 9   Fare         891 non-null    float64
 10  Cabin        204 non-null    object 
 11  Embarked     889 non-null    object 
dtypes: float64(2), int64(5), object(5)
memory usage: 83.7+ KB


### Display the data
* `tail(n)`: returns n numbers of last rows of the dataframe or series.
* `head(n)`: return  n number of top rows of a data frame or series.

In [55]:
df.tail(5)
df.head(5)

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [56]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   PassengerId  891 non-null    int64  
 1   Survived     891 non-null    int64  
 2   Pclass       891 non-null    int64  
 3   Name         891 non-null    object 
 4   Sex          891 non-null    object 
 5   Age          714 non-null    float64
 6   SibSp        891 non-null    int64  
 7   Parch        891 non-null    int64  
 8   Ticket       891 non-null    object 
 9   Fare         891 non-null    float64
 10  Cabin        204 non-null    object 
 11  Embarked     889 non-null    object 
dtypes: float64(2), int64(5), object(5)
memory usage: 83.7+ KB


# Manipulate on dataframe

## Filter by columns

In [57]:
pnames = df["Name"] 

In [58]:
age_sex = df[["Age", "Sex"]] #filter on multiple columns
age_sex.head(5)

Unnamed: 0,Age,Sex
0,22.0,male
1,38.0,female
2,26.0,female
3,35.0,female
4,35.0,male


## Filter by rows [and columns]

In [59]:
df.loc[2,:] #filter data of row 2 (third row), all columns

PassengerId                         3
Survived                            1
Pclass                              3
Name           Heikkinen, Miss. Laina
Sex                            female
Age                              26.0
SibSp                               0
Parch                               0
Ticket               STON/O2. 3101282
Fare                            7.925
Cabin                             NaN
Embarked                            S
Name: 2, dtype: object

In [60]:
#filter data of third row to 10th row, 3rd column to fifth column
df.iloc[2:10, 2:5]

Unnamed: 0,Pclass,Name,Sex
2,3,"Heikkinen, Miss. Laina",female
3,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female
4,3,"Allen, Mr. William Henry",male
5,3,"Moran, Mr. James",male
6,1,"McCarthy, Mr. Timothy J",male
7,3,"Palsson, Master. Gosta Leonard",male
8,3,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",female
9,2,"Nasser, Mrs. Nicholas (Adele Achem)",female


**Other ways:**

In [88]:
df.iloc[:,:-1] # get all row and column except the last column

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,New_Fare
0,1,0,3,anonymous,male,22.0,1,0,A/5 21171,7.2500,,S,8.70000
1,2,1,1,anonymous,female,38.0,1,0,PC 17599,71.2833,C85,C,85.53996
2,3,1,3,anonymous,female,26.0,0,0,STON/O2. 3101282,7.9250,,S,9.51000
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1000,C123,S,63.72000
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.0500,,S,9.66000
...,...,...,...,...,...,...,...,...,...,...,...,...,...
886,887,0,2,"Montvila, Rev. Juozas",male,27.0,0,0,211536,13.0000,,S,15.60000
887,888,1,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0000,B42,S,36.00000
888,889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,,1,2,W./C. 6607,23.4500,,S,28.14000
889,890,1,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0000,C148,C,36.00000


## Filter with condition

### Conditional expression <,>, but also ==, !=, <=, >=, Returning a boolean values (either True or False)



In [61]:
above_35 = df[df["Age"] > 35]
above_35

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
6,7,0,1,"McCarthy, Mr. Timothy J",male,54.0,0,0,17463,51.8625,E46,S
11,12,1,1,"Bonnell, Miss. Elizabeth",female,58.0,0,0,113783,26.5500,C103,S
13,14,0,3,"Andersson, Mr. Anders Johan",male,39.0,1,5,347082,31.2750,,S
15,16,1,2,"Hewlett, Mrs. (Mary D Kingcome)",female,55.0,0,0,248706,16.0000,,S
...,...,...,...,...,...,...,...,...,...,...,...,...
865,866,1,2,"Bystrom, Mrs. (Karolina)",female,42.0,0,0,236852,13.0000,,S
871,872,1,1,"Beckwith, Mrs. Richard Leonard (Sallie Monypeny)",female,47.0,1,1,11751,52.5542,D35,S
873,874,0,3,"Vander Cruyssen, Mr. Victor",male,47.0,0,0,345765,9.0000,,S
879,880,1,1,"Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)",female,56.0,0,1,11767,83.1583,C50,C


In [62]:
adult_names = df.loc[df["Age"] > 35, "Name"]

In [63]:
adult_names

1      Cumings, Mrs. John Bradley (Florence Briggs Th...
6                                McCarthy, Mr. Timothy J
11                              Bonnell, Miss. Elizabeth
13                           Andersson, Mr. Anders Johan
15                      Hewlett, Mrs. (Mary D Kingcome) 
                             ...                        
865                             Bystrom, Mrs. (Karolina)
871     Beckwith, Mrs. Richard Leonard (Sallie Monypeny)
873                          Vander Cruyssen, Mr. Victor
879        Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)
885                 Rice, Mrs. William (Margaret Norton)
Name: Name, Length: 217, dtype: object

### Similar to the conditional expression, the isin() conditional function returns a True for each row the values are in the provided list.

In [64]:
#filter rows for which the class is either 2 or 3

class_23 = df[(df["Pclass"] == 2) | (df["Pclass"] == 3)]# | is the or operator
class_23 = df[df["Pclass"].isin([2, 3])]
class_23

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.2500,,S
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.9250,,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.0500,,S
5,6,0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,,Q
7,8,0,3,"Palsson, Master. Gosta Leonard",male,2.0,3,1,349909,21.0750,,S
...,...,...,...,...,...,...,...,...,...,...,...,...
884,885,0,3,"Sutehall, Mr. Henry Jr",male,25.0,0,0,SOTON/OQ 392076,7.0500,,S
885,886,0,3,"Rice, Mrs. William (Margaret Norton)",female,39.0,0,5,382652,29.1250,,Q
886,887,0,2,"Montvila, Rev. Juozas",male,27.0,0,0,211536,13.0000,,S
888,889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,,1,2,W./C. 6607,23.4500,,S


### Other example

A subset of both rows and columns is made in one go and just using selection brackets [] is not sufficient anymore. The loc/iloc operators are required in front of the selection brackets []. When using loc/iloc, the part before the comma is the rows you want, and the part after the comma is the columns you want to select.

In [65]:
adult_names = df.loc[df["Age"] > 35, "Name"]


## Assign value

When selecting specific rows and/or columns with loc or iloc, new values can be assigned to the selected data. For example, to assign the name anonymous to the first 3 elements of the fourth column:

In [66]:
df.iloc[0:3, 3] = "anonymous"

# Add and Drop and rename columns

## Add column from exisiting column

In [67]:
df['New_Fare'] = df['Fare']*1.2
df['Gender'] = df['Sex']
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 14 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   PassengerId  891 non-null    int64  
 1   Survived     891 non-null    int64  
 2   Pclass       891 non-null    int64  
 3   Name         891 non-null    object 
 4   Sex          891 non-null    object 
 5   Age          714 non-null    float64
 6   SibSp        891 non-null    int64  
 7   Parch        891 non-null    int64  
 8   Ticket       891 non-null    object 
 9   Fare         891 non-null    float64
 10  Cabin        204 non-null    object 
 11  Embarked     889 non-null    object 
 12  New_Fare     891 non-null    float64
 13  Gender       891 non-null    object 
dtypes: float64(3), int64(5), object(6)
memory usage: 97.6+ KB


In [68]:
df_droped = df.drop(['New_Fare','Gender'], axis=1)
df_droped.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   PassengerId  891 non-null    int64  
 1   Survived     891 non-null    int64  
 2   Pclass       891 non-null    int64  
 3   Name         891 non-null    object 
 4   Sex          891 non-null    object 
 5   Age          714 non-null    float64
 6   SibSp        891 non-null    int64  
 7   Parch        891 non-null    int64  
 8   Ticket       891 non-null    object 
 9   Fare         891 non-null    float64
 10  Cabin        204 non-null    object 
 11  Embarked     889 non-null    object 
dtypes: float64(2), int64(5), object(5)
memory usage: 83.7+ KB


In [69]:
df_renamed = df.rename(
    columns={
        "Age": "Passenger_Age",
        "Sex": "Gender",
    }
)

# Calculate summary Statistics

In [70]:
df["Age"].mean()#average age of the Titanic passengers

29.69911764705882

In [71]:
df[["Age", "Fare"]].mean()

Age     29.699118
Fare    32.204208
dtype: float64

In [72]:
df[["Sex", "Age"]].groupby("Sex").mean()

Unnamed: 0_level_0,Age
Sex,Unnamed: 1_level_1
female,27.915709
male,30.726645


In [73]:
df.groupby("Pclass")["Pclass"].size()

Pclass
1    216
2    184
3    491
Name: Pclass, dtype: int64

In [74]:
df.groupby("Sex")["Fare"].max()

Sex
female    512.3292
male      512.3292
Name: Fare, dtype: float64

# Combine dataframes

In [75]:
df1 = pd.DataFrame(
    {
        "A": ["A0", "A1", "A2", "A3"],
        "B": ["B0", "B1", "B2", "B3"],
        "C": ["C0", "C1", "C2", "C3"],
        "D": ["D0", "D1", "D2", "D3"],
    },
    index=[0, 1, 2, 3],
)
df1

Unnamed: 0,A,B,C,D
0,A0,B0,C0,D0
1,A1,B1,C1,D1
2,A2,B2,C2,D2
3,A3,B3,C3,D3


In [76]:
df2 = pd.DataFrame(
    {
        "A": ["A1", "A5", "A6", "A7"],
        "B": ["B4", "B5", "B6", "B7"],
        "C": ["A3", "C5", "C6", "C7"],
        "D": ["D4", "D5", "D6", "D7"],
    },
    index=[4, 5, 6, 7],
)
df2

Unnamed: 0,A,B,C,D
4,A1,B4,A3,D4
5,A5,B5,C5,D5
6,A6,B6,C6,D6
7,A7,B7,C7,D7


In [77]:
df1 = pd.DataFrame(
    {
        "A": ["A0", "A1", "A2", "A3"],
        "B": ["B0", "B1", "B2", "B3"],
        "C": ["C0", "C1", "C2", "C3"],
        "D": ["D0", "D1", "D2", "D3"],
    }
)
print(df1)
df2 = pd.DataFrame(
    {
        "A": ["A1", "A5", "A6", "A7"],
        "B": ["B4", "B5", "B6", "B7"],
        "C": ["A3", "C5", "C6", "C7"],
        "D": ["D4", "D5", "D6", "D7"],
    }
)
print(df2)

    A   B   C   D
0  A0  B0  C0  D0
1  A1  B1  C1  D1
2  A2  B2  C2  D2
3  A3  B3  C3  D3
    A   B   C   D
0  A1  B4  A3  D4
1  A5  B5  C5  D5
2  A6  B6  C6  D6
3  A7  B7  C7  D7


In [78]:
df3 = df1.append(df2)
df3


AttributeError: 'DataFrame' object has no attribute 'append'

## Concatenating objects, i.e., by rows

## By columns

In [None]:
result = pd.concat([df1, df2], axis=1)
result

Unnamed: 0,A,B,C,D,A.1,B.1,C.1,D.1
0,A0,B0,C0,D0,,,,
1,A1,B1,C1,D1,,,,
2,A2,B2,C2,D2,,,,
3,A3,B3,C3,D3,,,,
4,,,,,A1,B4,A3,D4
5,,,,,A5,B5,C5,D5
6,,,,,A6,B6,C6,D6
7,,,,,A7,B7,C7,D7


Note: The join keyword specifies how to handle axis values that don’t exist in the first DataFrame.

join='outer' takes the union of all axis values

## Inner Join: Combines rows that have matching values in both DataFrames.

In [None]:
result = pd.merge(df1, df2, on='A')
result

Unnamed: 0,A,B_x,C_x,D_x,B_y,C_y,D_y
0,A1,B1,C1,D1,B4,A3,D4


## Apending dataframe

In [None]:
result = df1.append(df2) #similar to combine by rows, vertically
result








  result = df1.append(df2) #similar to combine by rows, vertically


Unnamed: 0,A,B,C,D
0,A0,B0,C0,D0
1,A1,B1,C1,D1
2,A2,B2,C2,D2
3,A3,B3,C3,D3
4,A1,B4,A3,D4
5,A5,B5,C5,D5
6,A6,B6,C6,D6
7,A7,B7,C7,D7


# Index

In [None]:
df1 = pd.DataFrame(
    {
        "A": ["A0", "A1", "A2", "A3"],
        "B": ["B0", "B1", "B2", "B3"],
        "C": ["C0", "C1", "C2", "C3"],
        "D": ["D0", "D1", "D2", "D3"],
    }
)
df1

Unnamed: 0,A,B,C,D
0,A0,B0,C0,D0
1,A1,B1,C1,D1
2,A2,B2,C2,D2
3,A3,B3,C3,D3


In [None]:
df2 = pd.DataFrame(
    {
        "A": ["A1", "A5", "A6", "A7"],
        "B": ["B4", "B5", "B6", "B7"],
        "C": ["A3", "C5", "C6", "C7"],
        "D": ["D4", "D5", "D6", "D7"],
    }
)
df2

Unnamed: 0,A,B,C,D
0,A1,B4,A3,D4
1,A5,B5,C5,D5
2,A6,B6,C6,D6
3,A7,B7,C7,D7


In [None]:
df3 = df1.append(df2)
df3

  df3 = df1.append(df2)


Unnamed: 0,A,B,C,D
0,A0,B0,C0,D0
1,A1,B1,C1,D1
2,A2,B2,C2,D2
3,A3,B3,C3,D3
0,A1,B4,A3,D4
1,A5,B5,C5,D5
2,A6,B6,C6,D6
3,A7,B7,C7,D7


In [None]:
df3.reset_index(inplace = True)
df3

Unnamed: 0,index,A,B,C,D
0,0,A0,B0,C0,D0
1,1,A1,B1,C1,D1
2,2,A2,B2,C2,D2
3,3,A3,B3,C3,D3
4,0,A1,B4,A3,D4
5,1,A5,B5,C5,D5
6,2,A6,B6,C6,D6
7,3,A7,B7,C7,D7


In [None]:
df3.set_index('A', inplace=True)
df3


Unnamed: 0_level_0,B,C,D
A,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
A0,B0,C0,D0
A1,B1,C1,D1
A2,B2,C2,D2
A3,B3,C3,D3
A1,B4,A3,D4
A5,B5,C5,D5
A6,B6,C6,D6
A7,B7,C7,D7


In [None]:
df3.reset_index(inplace = True)
df3


Unnamed: 0,A,B,C,D
0,A0,B0,C0,D0
1,A1,B1,C1,D1
2,A2,B2,C2,D2
3,A3,B3,C3,D3
4,A1,B4,A3,D4
5,A5,B5,C5,D5
6,A6,B6,C6,D6
7,A7,B7,C7,D7


# More methods

## `isna()` 

This method check if there is a missing value in the dataframe

return: a series/dataframe with the same shape of origin. The values are True if there were missing values, False otherwise.

In [None]:
import pandas as pd
import numpy as np

data = pd.DataFrame({
    "A": [1, 2, 3],
    "B": [4, 5, np.nan],
    "C": [6, 7, 8]
})

print(data.isna())

       A      B      C
0  False  False  False
1  False  False  False
2  False   True  False


## `sum()` - sum of Series

return: Series/Dataframe. The values are sums of the values in rows or columns

In [None]:
import pandas as pd

data = pd.DataFrame({
    "A": [1, 2, 3],
    "B": [4, 5, np.nan],
    "C": [6, 7, 8]
})

print(data.sum())

A     6.0
B     9.0
C    21.0
dtype: float64


### `value_counts()` - count number of unique values in Series/Dataframe

In [None]:
data = pd.Series([1, 2, 3, 4, 5, 1, 2, 3])

print(data.value_counts())

1    2
2    2
3    2
4    1
5    1
Name: count, dtype: int64


In [None]:
data = pd.DataFrame({
    "A": [1, 2, 3, 4, 5, 1, 2, 3],
    "B": [6, 7, 8, 9, 10, 6, 7, 8]
})

print(data.value_counts())

A  B 
1  6     2
2  7     2
3  8     2
4  9     1
5  10    1
Name: count, dtype: int64


Source: 

[geeksforgeeks-dataframe](https://www.geeksforgeeks.org/python-pandas-dataframe/)

[geeksforgeeks-series](https://www.geeksforgeeks.org/python-pandas-series/)