## Pandas Data Frames

- What Is Pandas?
- Pandas vs Numpy 
- Pandas Data Frame Intro
- Pandas Data Frame fundamental operations
    - Creating
    - Selecting/indexing
    - Inserting rows/columns
    - Setting data
    - Filtering
    - dropping rwos/ columns
- Dealing with Missing values

%matplotlib inline
import numpy as np
import pandas as pd
from IPython.display import Image
from IPython.display import HTML
from IPython.display import Markdown, display
def printmd(string):
    display(Markdown(string))


CSS = """
.output {
    align-items: center;
}
"""
HTML('''<script>
code_show=true; 
function code_toggle() {
 if (code_show){
 $('div.input').hide();
 } else {
 $('div.input').show();
 }
 code_show = !code_show
} 
$( document ).ready(code_toggle);
</script>
<form action="javascript:code_toggle()"><input type="submit" value="Code mode"></form>''')

In [1]:
from IPython.display import display, HTML

CSS = """
.output {
    align-items: center;
}
div.output_area {
    width: 80%;
}
"""
HTML('<style>{}</style>'.format(CSS))

# What is Pandas?

### - Enables working with tabular and labeled data easily and intuitively
### - Pandas is an open-source library built on top of Numpy Package.
- https://github.com/pandas-dev/pandas
- https://github.com/pandas-dev/pandas/blob/059c8bac51e47d6eaaa3e36d6a293a22312925e6/pandas/core/frame.py

### - Pandas data structures are:
    - Series
    - Index
    - Data Frame
    

## Quick refresh to Numpy Arrays..
- contains Numerical ***Homogonius*** Data
- may contain multi dimensional array elements.
- used for performing various numerical computations and processing of the multidimensional and single-dimensional array elements.

In [2]:
import numpy as np
np.random.seed(0)  # seed for reproducibility

two_dim_arr = np.random.randint(10, size=(3, 4))  # Two-dimensional array
three_dim_arr = np.random.randint(10, size=(3, 4, 5))  # Three-dimensional array


### A two dimensional Array example...

In [3]:
print("Two Dimentional Array")
two_dim_arr

Two Dimentional Array


array([[5, 0, 3, 3],
       [7, 9, 3, 5],
       [2, 4, 7, 6]])

### What I mean by Homogeneous...

In [4]:
print(two_dim_arr)

[[5 0 3 3]
 [7 9 3 5]
 [2 4 7 6]]


**two_dim_arr[0,0] = "Hello"**

In [5]:
printmd("***Oops....***")
two_dim_arr[0,0] = "Hello" 

NameError: name 'printmd' is not defined

### You can directly and simply form the DataFrame from the 2D array

In [6]:
import pandas as pd

In [7]:
print("Data Frame formed by 2D Array")
print("***df=pd.DataFrame(two_dim_arr)***")
df=pd.DataFrame(two_dim_arr)
df

Data Frame formed by 2D Array
***df=pd.DataFrame(two_dim_arr)***


Unnamed: 0,0,1,2,3
0,5,0,3,3
1,7,9,3,5
2,2,4,7,6


### Pandas Data Frame is Heterogeneous!
**df.iloc[0,0]="Hello"**

In [23]:
df.iloc[0,0]="Hello"
df

Unnamed: 0,0,1,2,3
0,Hello,0,3,3
1,7,9,3,5
2,2,4,7,6


### Pandas Data Frame labels the data with Indices and Columns labels
pd.DataFrame(np.random.randint(10,size=(3,2)),
             columns=['foo', 'bar'],
             index=['a', 'b', 'c'])

In [9]:
np.random.randint(10,size=(3,2))

array([[5, 9],
       [4, 4],
       [6, 4]])

In [10]:
##np.random.seed(0)
foo_df=pd.DataFrame(np.random.randint(10,size=(3,2)),
             columns=['foo', 'bar'],
             index=['a','b','c']
                   )
foo_df

Unnamed: 0,foo,bar
a,4,3
b,4,4
c,8,4


In [27]:
foo_df.loc["a",:]

foo    4
bar    3
Name: a, dtype: int64

### Pandas DataFrame is relevant for statistical observations/data points with various variables (categorical, etc) 

### It is intuitive...  Look how convenient it is!!

In [29]:
people_df= pd.read_csv("data/people.csv") # use relative path
people_df

Unnamed: 0,name,age,country
0,Pol,22,ES
1,Javi,20,ES
2,Maria,23,AR
3,Anna,24,FR
4,Anna,24,UK
5,Javi,30,MA
6,Dog,2,XX


In [30]:
Image('res/excel-to-pandas.png')

NameError: name 'Image' is not defined

source: https://jalammar.github.io/

### Describing the Data Frame...
- df.Info()
- df.count())
- df.describe())
- df.mean())

In [30]:
people_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7 entries, 0 to 6
Data columns (total 3 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   name     7 non-null      object
 1   age      7 non-null      int64 
 2   country  7 non-null      object
dtypes: int64(1), object(2)
memory usage: 296.0+ bytes


In [33]:
people_df.describe() # %50 percentile is the median

Unnamed: 0,age
count,7.0
mean,20.714286
std,8.807464
min,2.0
25%,21.0
50%,23.0
75%,24.0
max,30.0


In [37]:
people_df.iloc[0,1]=22
people_df

Unnamed: 0,name,age,country
0,Pol,22,ES
1,Javi,20,ES
2,Maria,23,AR
3,Anna,24,FR
4,Anna,24,UK
5,Javi,30,MA
6,Dog,2,XX


In [31]:
people_df.loc[:,"age"].mean()

20.714285714285715

In [32]:
people_df.age=people_df.age.astype(int)

In [33]:
people_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7 entries, 0 to 6
Data columns (total 3 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   name     7 non-null      object
 1   age      7 non-null      int64 
 2   country  7 non-null      object
dtypes: int64(1), object(2)
memory usage: 296.0+ bytes


In [34]:
people_df.iloc[:,0]

0      Pol
1     Javi
2    Maria
3     Anna
4     Anna
5     Javi
6      Dog
Name: name, dtype: object

In [35]:
type(people_df.iloc[:,1])

pandas.core.series.Series

In [36]:
pd.Series([3,4,5])   ##. Pandas Series is a one dimentional data frame

0    3
1    4
2    5
dtype: int64

### Pandas Data Frame operations

In [38]:
Image("res/CRUD.png")

NameError: name 'Image' is not defined

### Data Frame creation
You can create/form a Data Frame from:
- Dict of 1D ndarrays, lists, dicts, or Series

- 2-D numpy.ndarray

- Structured or record ndarray

- A Series

- Another DataFrame

#### Here is an example...

In [37]:
print('dic = {"col1": [1.0, 2.0, 3.0, 4.0], "col2": [4.0, 3.0, 2.0, 1.0]}\n')

dic = {"col1": [1.0, 2.0, 3.0, 4.0], "col2": [4.0, 3.0, 2.0, 1.0]}

print("pd.DataFrame(dic)\n\n",pd.DataFrame(dic))

dic = {"col1": [1.0, 2.0, 3.0, 4.0], "col2": [4.0, 3.0, 2.0, 1.0]}

pd.DataFrame(dic)

    col1  col2
0   1.0   4.0
1   2.0   3.0
2   3.0   2.0
3   4.0   1.0


#### creating Index for the Data frame...

In [45]:
df=pd.DataFrame(dic, index=["a", "b", "c", "d"])

In [46]:
df.columns = ["C1","C2"]
df

Unnamed: 0,C1,C2
a,1.0,4.0
b,2.0,3.0
c,3.0,2.0
d,4.0,1.0


### Creating Data frame from Pandas Series objects.. 

In [47]:
d = {
        "apples":  pd.Series([3, 2, 0,1]),
        "oranges": pd.Series([0, 3, 7, 2]),
    }

pd.DataFrame(d)

Unnamed: 0,apples,oranges
0,3,0
1,2,3
2,0,7
3,1,2


In [41]:
Image("res/series-and-dataframe.width-1200.png")

NameError: name 'Image' is not defined

source: https://storage.googleapis.com/lds-media/images/series-and-dataframe.width-1200.png

### Data Frame Selection / Indexing

In [49]:
type(df["C1"])

pandas.core.series.Series

In [50]:
pd.Series()

  pd.Series()


Series([], dtype: float64)

In [89]:
data = {
    'name': ['Xavier', 'Ann', 'Jana', 'Yi', 'Robin', 'Amal', 'Nori'],
    'city': ['Mexico City', 'Toronto', 'Prague', 'Shanghai',
             'Manchester', 'Cairo', 'Osaka'],
    'age': [41, 28, 33, 34, 38, 31, 37],
    'py-score': [88.0, 79.0, 81.0, 80.0, 68.0, 61.0, 84.0]
}

row_labels = [101, 102, 103, 104, 105, 106, 107]
students_df = pd.DataFrame(data=data, index=row_labels)
students_df


Unnamed: 0,name,city,age,py-score
101,Xavier,Mexico City,41,88.0
102,Ann,Toronto,28,79.0
103,Jana,Prague,33,81.0
104,Yi,Shanghai,34,80.0
105,Robin,Manchester,38,68.0
106,Amal,Cairo,31,61.0
107,Nori,Osaka,37,84.0


In [90]:
students_df = students_df.reset_index()

In [91]:
students_df

Unnamed: 0,index,name,city,age,py-score
0,101,Xavier,Mexico City,41,88.0
1,102,Ann,Toronto,28,79.0
2,103,Jana,Prague,33,81.0
3,104,Yi,Shanghai,34,80.0
4,105,Robin,Manchester,38,68.0
5,106,Amal,Cairo,31,61.0
6,107,Nori,Osaka,37,84.0


Source: https://realpython.com/

### data Selection

In [92]:
students_df.loc[[0,1,2],["name","city","age"]]

Unnamed: 0,name,city,age
0,Xavier,Mexico City,41
1,Ann,Toronto,28
2,Jana,Prague,33


In [66]:
students_df.iloc[0,0]

101

### Selecting by Label
- .loc[]  function

In [93]:
#print("students_df.loc[:, 'city']")
students_df.loc[:, 'city']

0    Mexico City
1        Toronto
2         Prague
3       Shanghai
4     Manchester
5          Cairo
6          Osaka
Name: city, dtype: object

In [77]:
print('df["city"]')
cities = students_df[["age","city"]]
cities

df["city"]


Unnamed: 0,age,city
0,41,Mexico City
1,28,Toronto
2,33,Prague
3,34,Shanghai
4,38,Manchester
5,31,Cairo
6,37,Osaka


In [94]:
print("df.city")
students_df.city

df.city


0    Mexico City
1        Toronto
2         Prague
3       Shanghai
4     Manchester
5          Cairo
6          Osaka
Name: city, dtype: object

### Selecting by Position
- .iloc[]

In [95]:
students_df.iloc[0,1]

'Xavier'

In [96]:
print("students_df.iloc[1:6, [0, 1]]")
students_df.iloc[1:6, [0, 1]]

students_df.iloc[1:6, [0, 1]]


Unnamed: 0,index,name
1,102,Ann
2,103,Jana
3,104,Yi
4,105,Robin
5,106,Amal


### Hmm.. Can you tell what is the difference between loc and iloc?

#### You can use loc and iloc also to select certain data values but better to use at[] and iat[]

In [102]:
print('df.at[1, name]')
students_df.at[3, 'name']

df.at[1, name]


'Yi'

In [103]:
print('df.iat[2, 0]')
students_df.iat[4, 0]

df.iat[2, 0]


105

### Setting/ Updating data

#### let us first update the Data frame index..

In [104]:
list(np.arange(10, 17))

[10, 11, 12, 13, 14, 15, 16]

In [105]:
print ('df.index = np.arange(10, 17)')
students_df.index = list(np.arange(10, 17))
students_df.reset_index()

df.index = np.arange(10, 17)


Unnamed: 0,level_0,index,name,city,age,py-score
0,10,101,Xavier,Mexico City,41,88.0
1,11,102,Ann,Toronto,28,79.0
2,12,103,Jana,Prague,33,81.0
3,13,104,Yi,Shanghai,34,80.0
4,14,105,Robin,Manchester,38,68.0
5,15,106,Amal,Cairo,31,61.0
6,16,107,Nori,Osaka,37,84.0


In [90]:
students_df=students_df.reset_index()


In [92]:
students_df.loc[1:4, 'py-score'] = [40, 50, 60, 70]
students_df

Unnamed: 0,index,name,city,age,py-score
0,10,Xavier,Mexico City,41,88.0
1,11,Ann,Toronto,28,40.0
2,12,Jana,Prague,33,50.0
3,13,Yi,Shanghai,34,60.0
4,14,Robin,Manchester,38,70.0
5,15,Amal,Cairo,31,61.0
6,16,Nori,Osaka,37,84.0


In [94]:
print('df.loc[14:, py-score] = 0')
students_df.loc[3:, 'py-score'] = 70
students_df

df.loc[14:, py-score] = 0


Unnamed: 0,index,name,city,age,py-score
0,10,Xavier,Mexico City,41,88.0
1,11,Ann,Toronto,28,40.0
2,12,Jana,Prague,33,50.0
3,13,Yi,Shanghai,34,70.0
4,14,Robin,Manchester,38,70.0
5,15,Amal,Cairo,31,70.0
6,16,Nori,Osaka,37,70.0


In [103]:
students_df.iloc[:,-1] = [88.0, 90, 81.0, 80.0, 68.0, 61.0, 84.0] # update all rows in py-score column
students_df

Unnamed: 0,index,name,city,age,py-score
0,10,Xavier,Mexico City,41,88.0
1,11,Ann,Toronto,28,90.0
2,12,Jana,Prague,33,81.0
3,13,Yi,Shanghai,34,80.0
4,14,Robin,Manchester,38,68.0
5,15,Amal,Cairo,31,61.0
6,16,Nori,Osaka,37,84.0


In [127]:
arr=np.random.random(size=(1,7))*100
arr[0]

array([43.3288062 , 75.61066939, 39.60982754, 89.60383875, 63.89210762,
       89.15544372, 68.00555695])

In [128]:
students_df["py-score"]=arr[0]
students_df

Unnamed: 0,index,name,city,age,py-score
0,10,Xavier,Mexico City,41,43.328806
1,11,Ann,Toronto,28,75.610669
2,12,Jana,Prague,33,39.609828
3,13,Yi,Shanghai,34,89.603839
4,14,Robin,Manchester,38,63.892108
5,15,Amal,Cairo,31,89.155444
6,16,Nori,Osaka,37,68.005557


In [129]:
students_df["py-score"]=students_df["py-score"]  

In [130]:
students_df
students_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7 entries, 0 to 6
Data columns (total 5 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   index     7 non-null      int64  
 1   name      7 non-null      object 
 2   city      7 non-null      object 
 3   age       7 non-null      int64  
 4   py-score  7 non-null      float64
dtypes: float64(1), int64(2), object(2)
memory usage: 408.0+ bytes


In [131]:
students_df[["age","py-score"]].info()
students_df.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7 entries, 0 to 6
Data columns (total 2 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   age       7 non-null      int64  
 1   py-score  7 non-null      float64
dtypes: float64(1), int64(1)
memory usage: 240.0 bytes


Unnamed: 0,index,name,city,age,py-score
0,10,Xavier,Mexico City,41,43.328806
1,11,Ann,Toronto,28,75.610669
2,12,Jana,Prague,33,39.609828
3,13,Yi,Shanghai,34,89.603839
4,14,Robin,Manchester,38,63.892108


In [136]:
students_df["py-score"]=list(map(lambda x: x+10,students_df["py-score"]))

In [137]:
students_df["py-score"]+= 10 

In [176]:
list(students_df.columns[0:])

['index', 'name', 'city', 'age', 'py-score']

In [177]:
students_df = students_df [ list(students_df.columns[1:]) ]

In [178]:
students_df

Unnamed: 0,name,city,age,py-score
0,Xavier,Mexico City,41,88.0
1,Ann,Toronto,28,79.0
2,Jana,Prague,33,81.0
3,Yi,Shanghai,34,80.0
4,Robin,Manchester,38,68.0
5,Amal,Cairo,31,61.0
6,Nori,Osaka,37,84.0


In [179]:
students_df[["age","py-score"]]

Unnamed: 0,age,py-score
0,41,88.0
1,28,79.0
2,33,81.0
3,34,80.0
4,38,68.0
5,31,61.0
6,37,84.0


### Inserting/deleteing rows

In [180]:
Ronald = pd.Series(data=['Ronald', 'Berlin', 34, 79],
                 index=students_df.columns[0:4],name=21)

In [144]:
Marcus = pd.Series(data=['Marcus', 'Berlin', 44, 75],
                 index=students_df.columns[0:4])

In [186]:
students_df=students_df.append(Ronald,ignore_index=True)


  students_df=students_df.append(Ronald,ignore_index=True)


In [187]:
students_df

Unnamed: 0,name,city,age,py-score,js-score
0,Xavier,Mexico City,41,88.0,71.0
1,Ann,Toronto,28,79.0,95.0
2,Jana,Prague,33,81.0,88.0
3,Yi,Shanghai,34,80.0,79.0
4,Robin,Manchester,38,68.0,91.0
5,Amal,Cairo,31,61.0,91.0
6,Nori,Osaka,37,84.0,80.0
7,Ronald,Berlin,34,79.0,


In [189]:
#print('df = df.drop(labels=[17])')
students_df.drop(labels=[7],inplace=True)
students_df

Unnamed: 0,name,city,age,py-score,js-score
0,Xavier,Mexico City,41,88.0,71.0
1,Ann,Toronto,28,79.0,95.0
2,Jana,Prague,33,81.0,88.0
3,Yi,Shanghai,34,80.0,79.0
4,Robin,Manchester,38,68.0,91.0
5,Amal,Cairo,31,61.0,91.0
6,Nori,Osaka,37,84.0,80.0


In [184]:
students_df["py-score"]

0    88.0
1    79.0
2    81.0
3    80.0
4    68.0
5    61.0
6    84.0
Name: py-score, dtype: float64

### Inserting/Deleting columns

In [190]:
#print('df[js-score] = np.array([71.0, 95.0, 88.0, 79.0, 91.0, 91.0, 80.0])')
#students_df['js-score'] =[71.0, 95.0, 88.0, 79.0, 91.0, 91.0, 80.0]
students_df

Unnamed: 0,name,city,age,py-score,js-score
0,Xavier,Mexico City,41,88.0,71.0
1,Ann,Toronto,28,79.0,95.0
2,Jana,Prague,33,81.0,88.0
3,Yi,Shanghai,34,80.0,79.0
4,Robin,Manchester,38,68.0,91.0
5,Amal,Cairo,31,61.0,91.0
6,Nori,Osaka,37,84.0,80.0


### Inserting in a specific location

In [199]:
#print('df.insert(loc=4, column=js-score,value=np.array([86.0, 81.0, 78.0, 88.0, 74.0, 70.0, 81.0]))')
students_df.insert(loc=4, column='django-score',
          value=np.array([70, 74, 78, 56, 66, 78, 81.0]))
students_df

Unnamed: 0,name,city,age,py-score,django-score,js-score
101,Xavier,Mexico City,41,88.0,70.0,71.0
102,Ann,Toronto,28,79.0,74.0,95.0
103,Jana,Prague,33,81.0,78.0,88.0
104,Yi,Shanghai,34,80.0,56.0,79.0
105,Robin,Manchester,38,68.0,66.0,91.0
106,Amal,Cairo,31,61.0,78.0,91.0
107,Nori,Osaka,37,84.0,81.0,80.0


### dropping specific column

In [192]:
## axis= 0 dropping by row,  axis=1. ropping by colum
students_df = students_df.drop(labels=['django-score'], axis=1) # axis=1 is columnwise

KeyError: "['django-score'] not found in axis"

In [193]:
students_df

Unnamed: 0,name,city,age,py-score,js-score
0,Xavier,Mexico City,41,88.0,71.0
1,Ann,Toronto,28,79.0,95.0
2,Jana,Prague,33,81.0,88.0
3,Yi,Shanghai,34,80.0,79.0
4,Robin,Manchester,38,68.0,91.0
5,Amal,Cairo,31,61.0,91.0
6,Nori,Osaka,37,84.0,80.0


### Filtering/Boolean Indexing

In [197]:
#print('filter_ = df[django-score] >= 80')
very_good_students_filter = students_df['py-score'] >= 80
very_good_students_filter

0     True
1    False
2     True
3     True
4    False
5    False
6     True
Name: py-score, dtype: bool

In [198]:
students_df[students_df["py-score"]>=80]

Unnamed: 0,name,city,age,py-score,js-score
0,Xavier,Mexico City,41,88.0,71.0
2,Jana,Prague,33,81.0,88.0
3,Yi,Shanghai,34,80.0,79.0
6,Nori,Osaka,37,84.0,80.0


In [206]:
students_df["js-score-updated"]= students_df["js-score"] + students_df["py-score"]+5

In [108]:
students_df

Unnamed: 0,index,name,city,age,py-score
10,101,Xavier,Mexico City,41,88.0
11,102,Ann,Toronto,28,79.0
12,103,Jana,Prague,33,81.0
13,104,Yi,Shanghai,34,80.0
14,105,Robin,Manchester,38,68.0
15,106,Amal,Cairo,31,61.0
16,107,Nori,Osaka,37,84.0


In [107]:
students_df.groupby('name').sum()

Unnamed: 0_level_0,index,age,py-score
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Amal,106,31,61.0
Ann,102,28,79.0
Jana,103,33,81.0
Nori,107,37,84.0
Robin,105,38,68.0
Xavier,101,41,88.0
Yi,104,34,80.0


In [208]:
students_df.age>40

101     True
102    False
103    False
104    False
105    False
106    False
107    False
Name: age, dtype: bool

In [349]:
students_df[students_df.age>40]

Unnamed: 0,name,city,age,py-score,js-score,js-score-updated
10,Xavier,Mexico City,41,88.0,71.0,76.0


In [351]:
print('df[filter_]')
students_df[very_good_students_filter]

df[filter_]


Unnamed: 0,name,city,age,py-score,js-score,js-score-updated
10,Xavier,Mexico City,41,88.0,71.0,76.0
11,Ann,Toronto,28,90.0,95.0,100.0
12,Jana,Prague,33,81.0,88.0,93.0
13,Yi,Shanghai,34,80.0,79.0,84.0
14,Robin,Manchester,38,68.0,91.0,96.0
15,Amal,Cairo,31,61.0,91.0,96.0
16,Nori,Osaka,37,84.0,80.0,85.0


### Creating powerful filters with Logical operators AND, OR, NOT, XOR

In [206]:
#print('df[(df[py-score] >= 80) & (df[js-score] >= 80)]')
students_df[(students_df['py-score'] >= 40) & (students_df['js-score'] >= 80)]

Unnamed: 0,name,city,age,py-score,js-score
1,Ann,Toronto,28,79.0,95.0
2,Jana,Prague,33,81.0,88.0
4,Robin,Manchester,38,68.0,91.0
5,Amal,Cairo,31,61.0,91.0
6,Nori,Osaka,37,84.0,80.0


### Working with Missing Data

### np.nan is used to represent missing values

In [207]:
print("df_ = pd.DataFrame({'x': [1, 2, np.nan, 4]})")
df_ = pd.DataFrame({'x': [1, 2, np.nan, 4]})
df_


df_ = pd.DataFrame({'x': [1, 2, np.nan, 4]})


Unnamed: 0,x
0,1.0
1,2.0
2,
3,4.0


In [211]:
df_["y"]=[2,np.nan,4,5]
df_

Unnamed: 0,x,y
0,1.0,2.0
1,2.0,
2,,4.0
3,4.0,5.0


### Dropping rows that contains missing values 

In [208]:
#print("df_.dropna()")
df_.dropna()

Unnamed: 0,x
0,1.0
1,2.0
3,4.0


### and you can fill the missing values with fillna..

In [219]:
df_ = pd.DataFrame({'x': [1, 2, np.nan, np.nan]})
print('df_.fillna(value=0)\n',df_.fillna(value=0))
print("\ndf_.fillna(method=ffill)\n",df_.fillna(method='ffill'))
print("\ndf_.fillna(method=bfill)\n",df_.fillna(method='bfill'))

df_.fillna(value=0)
      x
0  1.0
1  2.0
2  0.0
3  0.0

df_.fillna(method=ffill)
      x
0  1.0
1  2.0
2  2.0
3  2.0

df_.fillna(method=bfill)
      x
0  1.0
1  2.0
2  NaN
3  NaN


In [223]:
people_df

Unnamed: 0,name,age,country
0,Pol,22,ES
1,Javi,20,ES
2,Maria,23,AR
3,Anna,24,FR
4,Anna,24,UK
5,Javi,30,MA
6,Dog,2,XX


## Value_counts()  , describe()

In [222]:
people_df.name.value_counts()

Javi     2
Anna     2
Pol      1
Maria    1
Dog      1
Name: name, dtype: int64