# Pandas 
### Introduction: 
Pandas includes includes data structures and data manipulation tools meant to make data cleaning and analysis in Python quick and straightforward. It is frequently used in conjunction with numerical computing libraries such as NumPy and SciPy, analytical libraries such as statsmodels and scikit-learn, and data visualization libraries such as matplotlib. In this jupyter notebook, we will aim to cover the fundamentals of what the Pandas library can do for you.

### Contents
<ol>
    <li>Series</li>
    <li>Dataframe</li>
</ol>

###### Imports 

In [1]:
import pandas as pd 

### 1. Series 
A Pandas Series is a one-dimensional labeled array that can hold data of any type (integer, float, string, etc.).

In [2]:
# Create a Series object
serie= pd.Series([4, 3, 5, 7])
serie

0    4
1    3
2    5
3    7
dtype: int64

In [3]:
# It is a set of indexed values
# Values
print('Values:', serie.values)

# indexes
print('Indexes:', serie.index)

Values: [4 3 5 7]
Indexes: RangeIndex(start=0, stop=4, step=1)


In the previous example, we fed in the data [4, 3, 5, 7], and the index range was set to the default. But, if we want, we may set the indexes:

In [4]:
series1= pd.Series([4, 3, 5, 7], index=['A', 'B', 'C', 'D'])
series1

A    4
B    3
C    5
D    7
dtype: int64

In [5]:
# Extract a value by index
series1['A']

4

In [6]:
# Set a value by index
series1['A']= 10
series1

A    10
B     3
C     5
D     7
dtype: int64

In [7]:
# Extract a setof values by indexes
series1[['A', 'D']]

A    10
D     7
dtype: int64

In [8]:
# Multiply values by a number 
series1 * 2

A    20
B     6
C    10
D    14
dtype: int64

In [9]:
# Check if an index exists in the Series object 
print('A' in series1)
print('G' in series1)

True
False


In [10]:
# Initialise a series object using a dictionnary 
# Set a dictionnary 
data= {'Volvo': 3, 'Mercedes': 6, 'Renault': 67}

# Create the Series object
series2= pd.Series(data)

# Show series content 
series2

Volvo        3
Mercedes     6
Renault     67
dtype: int64

In [11]:
# Check if a series contain a Nan value using isnull() function
# isnull() returns True if the value is Nan, otherwise False.  
series2.isna()

Volvo       False
Mercedes    False
Renault     False
dtype: bool

In [12]:
# Filtering pandas Series
# Filter values that are superior then 2
series2[series2> 2]

Volvo        3
Mercedes     6
Renault     67
dtype: int64

In [13]:
# Filter by a condition
series2[series2 % 2 == 0]

Mercedes    6
dtype: int64

### 2. Dataframes
A dataframe is a two-dimensional table-like data structure that is used to structurely store and modify data. It is a fundamental data structure in the pandas library, and it is frequently used in data analysis and manipulation. Let's start with creating one:

In [14]:
# Create a dataframe with three columns: Player, nation, number of World Cup trophies 
data= {'player': ['Ronaldo', 'Messi', 'Pele', 'Maradonna', 'Ronaldinho', 'Beckham'],
       'nation': ['Portugal', 'Argentina', 'Brazil', 'Argentina', 'Brazil', 'England'],
       'world cup': [0, 1, 3, 1, 1, 0]}

df= pd.DataFrame(data)

# diplay dataframe
df

Unnamed: 0,player,nation,world cup
0,Ronaldo,Portugal,0
1,Messi,Argentina,1
2,Pele,Brazil,3
3,Maradonna,Argentina,1
4,Ronaldinho,Brazil,1
5,Beckham,England,0


In [15]:
# columns 
df.columns

Index(['player', 'nation', 'world cup'], dtype='object')

In [16]:
# To display the first 5 rows
df.head()

Unnamed: 0,player,nation,world cup
0,Ronaldo,Portugal,0
1,Messi,Argentina,1
2,Pele,Brazil,3
3,Maradonna,Argentina,1
4,Ronaldinho,Brazil,1


In [17]:
# To display the last 5 rows
df.tail()

Unnamed: 0,player,nation,world cup
1,Messi,Argentina,1
2,Pele,Brazil,3
3,Maradonna,Argentina,1
4,Ronaldinho,Brazil,1
5,Beckham,England,0


In [18]:
# shape of dataframe 
df.shape # 6 rows and 3 columns

(6, 3)

In [19]:
# length of dataframe or number of rows
len(df)

6

In [20]:
# Rtrieve a dataframe column as a Series object
df['player'] # or df.player

0       Ronaldo
1         Messi
2          Pele
3     Maradonna
4    Ronaldinho
5       Beckham
Name: player, dtype: object

In [21]:
# Filter by conditions using loc
# Single condition
df.loc[df['world cup'] > 1]

Unnamed: 0,player,nation,world cup
2,Pele,Brazil,3


In [22]:
# two conditions or more 
# And operator: &
# Or operator: | 
df.loc[ (df['world cup'] >= 1) & (df['nation'] == 'Argentina') ]

Unnamed: 0,player,nation,world cup
1,Messi,Argentina,1
3,Maradonna,Argentina,1


In [23]:
# Add a new column to the dataframe
goals= list(range(6))
df['goals']= goals
df

Unnamed: 0,player,nation,world cup,goals
0,Ronaldo,Portugal,0,0
1,Messi,Argentina,1,1
2,Pele,Brazil,3,2
3,Maradonna,Argentina,1,3
4,Ronaldinho,Brazil,1,4
5,Beckham,England,0,5


In [24]:
# Dropping columns 
df.drop(columns=['goals'], inplace= True)
df

Unnamed: 0,player,nation,world cup
0,Ronaldo,Portugal,0
1,Messi,Argentina,1
2,Pele,Brazil,3
3,Maradonna,Argentina,1
4,Ronaldinho,Brazil,1
5,Beckham,England,0


In [25]:
# displayong duplicates
df.duplicated()

0    False
1    False
2    False
3    False
4    False
5    False
dtype: bool