# **Day 2 - Data Manipulation**

## Contents

- **Part 1 - Numpy (45 min)**
    - 1.0. Lists and Numpy
    - 1.1. 1D arrays
    - 1.2. 2D arrays
    - 1.3. Basic Statistic
- **Part 2 - Pandas (75 min)**
    - 2.1. Dataframes Methods and Attributes
    - 2.2. Import Data
    - 2.3. Explore Dataframes
    - 2.4. Dataframe groupby Method
    - 2.5. Dataframe: Filtering
    - 2.6. Dataframe: Slicing
    - 2.7. Sorting Dataframes
    - 2.8. Handling Missing Values
    - 2.9. Merge Dataframes
- **Test (15 min)**

# Import Python Libraries

**NumPy:** (https://numpy.org/)
- introduces objects for multidimensional arrays and matrices, as well as functions that allow to easily perform advanced mathematical and statistical operations on those objects.
- Provides vectorization of mathematical operations on arrays and matrices which significantly improves the performance
- many other python libraries are built on NumPy.

**Pandas:** (https://pandas.pydata.org/)
- Adds data structures and tools designed to work with table-like data.
- Provides tools for data manipulation: reshaping, merging, sorting, slicing, aggregation etc.
- Allows handling missing data.

**SQLAlchemy:** (https://www.sqlalchemy.org/)

In [1]:
import numpy as np  # import numpy library under the name or alias np
import pandas as pd  # import pandas library under the name or alias pd
from sqlalchemy import create_engine  # import a module of sqlalchemy library

# Part 1. Numpy

## 1.0 Lists and Numpy

Lists are:
- Collection of values
- Hold different types of data
- Can be changed, added or moved
- Used for basic mathematical operations and Data Science

In [2]:
height = [1.73, 1.68, 1.71, 1.89, 1.79]
print(height)
print(type(height))

[1.73, 1.68, 1.71, 1.89, 1.79]
<class 'list'>


In [3]:
weight = [65.4, 59.2,63.6, 88.4, 68.7]
print(weight)
print(type(weight))

[65.4, 59.2, 63.6, 88.4, 68.7]
<class 'list'>


### What is numpy?
NumPy is the fundamental package for scientific computing in Python. It stands for 'Numerical Python'. It is a library consisting of multidimensional array objects and a collection of routines for processing of array.

by using NumPy, a developer can perform the following operations:
- Mathematical and logical operations on arrays.
- Fourier transforms and routines for shape manipulation.
- Operations related to linear algebra. NumPy has in-built functions for linear algebra and random number generation

Has the special features of:
- Can contain only a single data type
- Use less space in memory
- Calculations over entire arrays
- Easy and fast

**NumPy** is often used along with packages like **SciPy** (Scientific Python) and **Mat−plotlib** (plotting library). This combination is widely used as a replacement for MatLab, a popular platform for technical computing. However, Python alternative to MatLab is now seen as a more modern and complete programming language.
It is open source, which is an added advantage of NumPy.

![Picture title](Recursos/Numpy.png)

## 1.1. 1D arrays
### Numpy array creation
Like lists, arrays are collections of items, but of the same type (e.g., all numbers or all strings). To create and use arrays in Python, you need use the previously imported library `numpy`.


In [5]:
# Create an array from scratch
my_array = np.array([-10, 30, 60, 90, 120, 150, -100])
print(my_array)

[ -10   30   60   90  120  150 -100]


In [6]:
# Alternative options to create arrays from scratch
# Create an 1D array of integers from one to ten using arange
print('Alternative 1:', np.arange(1, 11))

# Create a 2x4 array with numbers between (0,1) using random
print('Alternative 2:', np.random.random((2, 4)))

Alternative 1: [ 1  2  3  4  5  6  7  8  9 10]
Alternative 2: [[0.83131076 0.04359541 0.22171181 0.57828331]
 [0.17080531 0.4333413  0.11809281 0.21314899]]


In [96]:
# Transform the lists height and weight into numpy arrays
np_height = np.array(height)
print(type(np_height),np_height)
np_weight = np.array(weight)
print(type(np_weight), np_weight)
np_height

<class 'numpy.ndarray'> [1.73 1.68 1.71 1.89 1.79]
<class 'numpy.ndarray'> [65.4 59.2 63.6 88.4 68.7]


array([1.73, 1.68, 1.71, 1.89, 1.79])

In [10]:
# Numpy: remarks
np.array([1.0, "is", True])
# Numpy arrays: contain only one type

array(['1.0', 'is', 'True'], dtype='<U32')

### Subsetting and modifiying 1D Numpy arrays

In [4]:
# the following code will give an error
#bmi = weight / height ** 2

#How to solve it: Using numpy


In [97]:
bmi = np_weight / np_height ** 2
#bmi = np.array([21.85171573, 20.97505669, 21.75028214, 24.7473475 , 21.44127836])
bmi

array([21.85171573, 20.97505669, 21.75028214, 24.7473475 , 21.44127836])

In [12]:
# Subsetting
# obtain the last element
bmi[-1]

21.44127836209856

In [13]:
#identify the values greather than 21


array([ True, False,  True,  True,  True])

In [14]:
# create the list bmiplus with the elements whose values greather than 21

bmiplus


array([21.85171573, 21.75028214, 24.7473475 , 21.44127836])

In [15]:
print(my_array)
my_array[0] = 0 # modify first element
my_array[-1] = 180 # modify last element
print(my_array)

[ -10   30   60   90  120  150 -100]
[  0  30  60  90 120 150 180]


In [16]:
my_array = np.(my_array,[240, 270, 300, 330, 360]) # append elements
print(my_array)

[  0  30  60  90 120 150 180 240 270 300 330 360]


In [17]:
my_array = np.(my_array,7,210) # insert element at index 7
print(my_array) 

[  0  30  60  90 120 150 180 210 240 270 300 330 360]


In [18]:
my_array = np.(my_array,-1) # delete last element
print(my_array)

[  0  30  60  90 120 150 180 210 240 270 300 330]


### Operations with 1D arrays

In [98]:
print('number of elements in array =', my_array.shape[0]) # the [0] retrieves element 0 of shape

number of elements in array = 12


In [99]:
print('number of elements in array =', my_array.size) # print the number of elements in array

number of elements in array = 12


In [23]:
# Different types: different bevabour
print(height)
print(height + height) 
print(np_height)
print(np_height+np_height)

[1.73, 1.68, 1.71, 1.89, 1.79]
[1.73, 1.68, 1.71, 1.89, 1.79, 1.73, 1.68, 1.71, 1.89, 1.79]
[1.73 1.68 1.71 1.89 1.79]
[3.46 3.36 3.42 3.78 3.58]


## 1.2. 2D arrays

In [24]:
# create a 2D array 
np_2d = np.array([[1.73, 1.68, 1.71, 1.89, 1.79],
[65.4, 59.2, 63.6, 88.4, 68.7]])
np_2d

array([[ 1.73,  1.68,  1.71,  1.89,  1.79],
       [65.4 , 59.2 , 63.6 , 88.4 , 68.7 ]])

### Subsetting and modifiying 2D Numpy arrays

In [100]:
# use the method shape for np_2d
print() #complete the code to print the array shape




![Picture title](Recursos/2Darray.png)

In [26]:
#get the first row of the array
np_2d[0]

array([1.73, 1.68, 1.71, 1.89, 1.79])

In [27]:
# get the fourth element of the first row
np_2d[0,3]

1.89

In [28]:
# get from bot rows the first three elements
np_2d[:,0:3 ]

array([[ 1.73,  1.68,  1.71],
       [65.4 , 59.2 , 63.6 ]])

![Picture title](Recursos/SinFunction.png)

In [102]:
my_sines = np.sin(np.radians(my_array)) # compute the sine of the elements in my_array
print(np.round(my_sines,2)) # print the array with just two decimal places

[ 0.    0.5   0.87  1.    0.87  0.5   0.   -0.5  -0.87 -1.   -0.87 -0.5 ]


In [30]:
# create a 3 x 3 array
array_a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(array_a)

[[1 2 3]
 [4 5 6]
 [7 8 9]]


In [104]:
# array plus scalar


array([[ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])

In [105]:
print(array_a * 2) # array times scalar
print()
print(array_a / 2) # array divided by scalar
print()
print(array_a ** 2) # array elevated to the scalar

[[ 2  4  6]
 [ 8 10 12]
 [14 16 18]]

[[0.5 1.  1.5]
 [2.  2.5 3. ]
 [3.5 4.  4.5]]

[[ 1  4  9]
 [16 25 36]
 [49 64 81]]


In [33]:
# Create another 3 x 3 array
array_b = np.array([[9, 8, 7], [6, 5, 4], [3, 2, 1]])

print() # element-wise sum the two arrays

[[10 10 10]
 [10 10 10]
 [10 10 10]]


### Comparison operators and Boolean arrays

Comparison operators also work on arrays. Suppose we have two arrays with the months of the year, and precipitation values in mm. Suppose we'd like to know the months with precipitation higher than 100 mm and their precipitation values. We can do the following:

In [36]:
months = np.array(['January', 'February', 'March', 'April', 'May', 'June', 
                   'July', 'August', 'September', 'October', 'November', 'December']) # array of months
precip = np.array([142, 89, 114, 74, 53, 38, 13, 25, 43, 109, 165, 137]) # monthly precipitation values in mm

high_precip = precip > 100 # Boolean array for monthly precipitation values more than 100 mm
low_precip = precip < 100

print(high_precip, '\n') # print Boolean array

print('Months with precipitation > 100 mm:', months[high_precip]) # print months with precipitation > 100 mm
print('Values of precipitation in these months:', precip[high_precip]) # print these months precipitation values

print('Months with precipitation = 100 mm:', months[low_precip]) # print months with precipitation < 100 mm
print('Values of precipitation in these months:', precip[low_precip]) # print these months precipitation values

[ True False  True False False False False False False  True  True  True] 

Months with precipitation > 100 mm: ['January' 'March' 'October' 'November' 'December']
Values of precipitation in these months: [142 114 109 165 137]
Months with precipitation = 100 mm: ['February' 'April' 'May' 'June' 'July' 'August' 'September']
Values of precipitation in these months: [89 74 53 38 13 25 43]


## 1.3. Basic Statistics

In [37]:
#Create random data
# np.ramdom.norma(distribution mean, distribution standar deviation, number of samples)
rd_height = np.round(np.random.normal(1.75, 0.20, 200), 2)
rd_weight = np.round(np.random.normal(60.32, 15, 200), 2)
np_city = np.column_stack((rd_height, rd_weight))
np_city

array([[ 2.01, 58.74],
       [ 1.89, 60.26],
       [ 1.54, 81.07],
       [ 1.6 , 47.65],
       [ 1.6 , 50.87],
       [ 2.06, 70.43],
       [ 1.34, 62.5 ],
       [ 1.95, 47.95],
       [ 1.74, 67.7 ],
       [ 1.96, 75.16],
       [ 1.82, 57.07],
       [ 1.99, 42.65],
       [ 1.6 , 53.46],
       [ 1.32, 60.79],
       [ 2.03, 53.69],
       [ 1.83, 61.  ],
       [ 1.58, 61.5 ],
       [ 1.54, 47.68],
       [ 1.71, 59.04],
       [ 2.05, 56.19],
       [ 1.83, 28.78],
       [ 1.74, 44.57],
       [ 1.95, 73.78],
       [ 1.57, 37.69],
       [ 1.76, 59.59],
       [ 1.86, 55.92],
       [ 1.82, 80.8 ],
       [ 1.97, 72.9 ],
       [ 1.58, 34.19],
       [ 1.49, 61.61],
       [ 1.81, 44.51],
       [ 1.63, 47.74],
       [ 1.96, 45.87],
       [ 1.89, 52.28],
       [ 1.37, 53.71],
       [ 1.25, 50.37],
       [ 1.78, 50.33],
       [ 1.95, 48.  ],
       [ 1.82, 70.64],
       [ 2.  , 37.  ],
       [ 1.71, 45.06],
       [ 1.95, 75.76],
       [ 1.8 , 35.07],
       [ 1.

In [38]:
# Get the mean of height
#Using basic operations
mean = np_city.sum(axis=0)/200 #/np_city.size
print(mean)

#using mean function
print(np.mean(np_city[:,0]))
print(np.mean(np_city[:,1]))

[ 1.74205 59.4416 ]
1.7420499999999999
59.4416


In [39]:
# Get the median of height
np.median(np_city[:,0])
# Get the standard deviation
np.std(np_city[:,0])

1.75

In [42]:
teachers_publications= np.array([[2,4,4],[0,2,4],[2,4,2],[3,1,3],[5,3,2]])
teachers_publications

array([[2, 4, 4],
       [0, 2, 4],
       [2, 4, 2],
       [3, 1, 3],
       [5, 3, 2]])

In [43]:
print(teachers_publications.sum(axis=0)) # cumulative publications by teacher
print(teachers_publications.sum(axis=1)) # cumulative publications by year

print(teachers_publications.min(axis=1)) # minimun number in the array by year
print(teachers_publications.max(axis=1)) # maximun number in the array by year

print(teachers_publications.min()) # minimun number in the array
print(teachers_publications.max()) # maximun number in the array

[12 14 15]
[10  6  8  7 10]
0
5
[2 0 2 1 2]
[4 4 4 3 5]


# Part 2. Pandas

## 2.1. Dataframes Methods and Attributes

Python objects have attributes and methods.
**Methods:** Unlike attributes, python methods have parenthesis.

To Insert a table with the main Pandas methods 

## 2.2. Import and Read Data

- Import data from:
    - CSV
    - XLSX
    - SQL

### 2.2.1. Import a CSV File
To import a csv file, we use the *read_csv* pandas's method.

In [44]:
# Import and read a csv file


Unnamed: 0,Country,ISO 3166-1 alpha-3,Year,Total,Coal,Oil,Gas,Cement,Flaring,Other,Per Capita
0,Afghanistan,AFG,1750,0.000000,,,,,,,
1,Afghanistan,AFG,1751,0.000000,,,,,,,
2,Afghanistan,AFG,1752,0.000000,,,,,,,
3,Afghanistan,AFG,1753,0.000000,,,,,,,
4,Afghanistan,AFG,1754,0.000000,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...
63099,Global,WLD,2017,36096.739276,14506.973805,12242.627935,7144.928128,1507.923185,391.992176,302.294047,4.749682
63100,Global,WLD,2018,36826.506600,14746.830688,12266.016285,7529.846784,1569.218392,412.115746,302.478706,4.792753
63101,Global,WLD,2019,37082.558969,14725.978025,12345.653374,7647.528220,1617.506786,439.253991,306.638573,4.775633
63102,Global,WLD,2020,35264.085734,14174.564010,11191.808551,7556.290283,1637.537532,407.583673,296.301685,4.497423


### 2.2.2. Import a xlsx File
To import a xlsx file, we use the *read_excel* method.

In [45]:
# Convert csv file to xlsx file
!pip install openpyxl
#df1.to_excel("Datos/CO2Emissions.xlsx")

Collecting openpyxl
  Downloading openpyxl-3.1.2-py2.py3-none-any.whl (249 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m250.0/250.0 KB[0m [31m29.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting et-xmlfile
  Downloading et_xmlfile-1.1.0-py3-none-any.whl (4.7 kB)
Installing collected packages: et-xmlfile, openpyxl
Successfully installed et-xmlfile-1.1.0 openpyxl-3.1.2
You should consider upgrading via the '/root/venv/bin/python -m pip install --upgrade pip' command.[0m[33m
[0m

In [46]:
# Import and read a xlsx file


Unnamed: 0.1,Unnamed: 0,Country,ISO 3166-1 alpha-3,Year,Total,Coal,Oil,Gas,Cement,Flaring,Other,Per Capita
0,0,Afghanistan,AFG,1750,0.000000,,,,,,,
1,1,Afghanistan,AFG,1751,0.000000,,,,,,,
2,2,Afghanistan,AFG,1752,0.000000,,,,,,,
3,3,Afghanistan,AFG,1753,0.000000,,,,,,,
4,4,Afghanistan,AFG,1754,0.000000,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...
63099,63099,Global,WLD,2017,36096.739276,14506.973805,12242.627935,7144.928128,1507.923185,391.992176,302.294047,4.749682
63100,63100,Global,WLD,2018,36826.506600,14746.830688,12266.016285,7529.846784,1569.218392,412.115746,302.478706,4.792753
63101,63101,Global,WLD,2019,37082.558969,14725.978025,12345.653374,7647.528220,1617.506786,439.253991,306.638573,4.775633
63102,63102,Global,WLD,2020,35264.085734,14174.564010,11191.808551,7556.290283,1637.537532,407.583673,296.301685,4.497423


### 2.2.3. Import Data from Database
To import tables from a database, we should use the **sqlaclhemy library** by calling its *create_engine* module.Then, we can use the *read_sql_query* pandas's method.

**SQLiteStudio:** https://sqlitestudio.pl/

In [47]:
# Import tables from a SQL Database

# Create a engine
engine=create_engine('sqlite:///Datos/CO2Emissions.db')


Unnamed: 0,Country,ISO 3166-1 alpha-3,Year,Total,Coal,Oil,Gas,Cement,Flaring,Other,Per Capita
0,Afghanistan,AFG,1750,0,,,,,,,
1,Afghanistan,AFG,1751,0,,,,,,,
2,Afghanistan,AFG,1752,0,,,,,,,
3,Afghanistan,AFG,1753,0,,,,,,,
4,Afghanistan,AFG,1754,0,,,,,,,


## 2.3. Explore Dataframes

- head method (To explore first 5 rows)
- dtypes attribute (To check data types)
- Other attributes: columns, ndim, size, shape, values
- Other methods: info(), describe(), median(), mean(), std(), max(), min()

In [48]:
# List first 5 records


Unnamed: 0,Country,ISO 3166-1 alpha-3,Year,Total,Coal,Oil,Gas,Cement,Flaring,Other,Per Capita
0,Afghanistan,AFG,1750,0.0,,,,,,,
1,Afghanistan,AFG,1751,0.0,,,,,,,
2,Afghanistan,AFG,1752,0.0,,,,,,,
3,Afghanistan,AFG,1753,0.0,,,,,,,
4,Afghanistan,AFG,1754,0.0,,,,,,,


In [49]:
# Check types for all columns
df1.dtypes

Country                object
ISO 3166-1 alpha-3     object
Year                    int64
Total                 float64
Coal                  float64
Oil                   float64
Gas                   float64
Cement                float64
Flaring               float64
Other                 float64
Per Capita            float64
dtype: object

In [50]:
# Show column names


Index(['Country', 'ISO 3166-1 alpha-3', 'Year', 'Total', 'Coal', 'Oil', 'Gas',
       'Cement', 'Flaring', 'Other', 'Per Capita'],
      dtype='object')

In [51]:
# Show dimension of dataframe


(63104, 11)

In [52]:
# Show summary of each column


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 63104 entries, 0 to 63103
Data columns (total 11 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Country             63104 non-null  object 
 1   ISO 3166-1 alpha-3  61472 non-null  object 
 2   Year                63104 non-null  int64  
 3   Total               62904 non-null  float64
 4   Coal                21744 non-null  float64
 5   Oil                 21717 non-null  float64
 6   Gas                 21618 non-null  float64
 7   Cement              20814 non-null  float64
 8   Flaring             21550 non-null  float64
 9   Other               1620 non-null   float64
 10  Per Capita          18974 non-null  float64
dtypes: float64(8), int64(1), object(2)
memory usage: 5.3+ MB


In [53]:
# Show descriptive statistics


Unnamed: 0,Year,Total,Coal,Oil,Gas,Cement,Flaring,Other,Per Capita
count,63104.0,62904.0,21744.0,21717.0,21618.0,20814.0,21550.0,1620.0,18974.0
mean,1885.5,55.224788,73.968916,55.760624,23.504285,4.330443,1.712695,10.951389,4.413363
std,78.519728,824.845435,598.986992,519.034563,247.674772,50.30577,16.727067,39.034073,17.432815
min,1750.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,1817.75,0.0,0.0,0.0916,0.0,0.0,0.0,0.520885,0.197866
50%,1885.5,0.0,0.271852,1.04424,0.0,0.022756,0.0,1.255329,1.303949
75%,1953.25,0.549342,6.736411,8.339752,0.581628,0.568502,0.0,4.385471,5.077994
max,2021.0,37123.850352,15051.51277,12345.653374,7921.829472,1672.592372,439.253991,306.638573,834.192642


In [54]:
# Show min, max, mean, median, and std of a column
 # Min value of each column
print(" ") # White space
# Identify the min amount of CO2 Emissions from coal industry


Country       Afghanistan
Year                 1750
Total                 0.0
Coal                  0.0
Oil                   0.0
Gas                   0.0
Cement                0.0
Flaring               0.0
Other                 0.0
Per Capita            0.0
dtype: object
 
0.0


In [55]:
# Indentify the max amount of CO2 Emissions from coal


The max amount of CO2 from coal: 15051.51 Mt


## 2.4. Dataframe groupby Method

Using "group by" method we can:
- Split the data into groups based on some criteria
- Calculate statistics (or apply a function) to each group

In [56]:
# Group data using a categorical column



<pandas.core.groupby.generic.SeriesGroupBy object at 0x7f475bbbd790>

In [57]:
# Calculate mean value of CO2 emissions by country



Unnamed: 0_level_0,Coal,Oil,Gas
Country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Afghanistan,0.790089,1.802428,0.277625
Albania,0.750357,2.088072,0.187722
Algeria,1.108882,15.600961,20.324051
Andorra,0.000000,0.483890,0.000000
Angola,0.022233,4.817033,0.640445
...,...,...,...
Viet Nam,17.438277,10.834602,2.387313
Wallis and Futuna Islands,0.000000,0.024157,0.000000
Yemen,0.069130,8.265044,0.189377
Zambia,1.665252,1.675202,0.000000


In [58]:
# Show max CO2 emissions by year


Unnamed: 0,Year,Coal,Oil,Gas
0,1750,9.350528,0.000000,0.000000
1,1751,9.350528,0.000000,0.000000
2,1752,9.354192,0.000000,0.000000
3,1753,9.354192,0.000000,0.000000
4,1754,9.357856,0.000000,0.000000
...,...,...,...,...
267,2017,14506.973805,12242.627935,7144.928128
268,2018,14746.830688,12266.016285,7529.846784
269,2019,14725.978025,12345.653374,7647.528220
270,2020,14174.564010,11191.808551,7556.290283


## 2.5. Dataframe: Filtering

To subset the data we can apply Boolean indexing. This indexing is commonly known as a filter.

In [59]:
# Show CO2 Emissions from Oil industry greater than 100 Mt



Unnamed: 0,Country,ISO 3166-1 alpha-3,Year,Total,Coal,Oil,Gas,Cement,Flaring,Other,Per Capita
3241,Australia,AUS,1999,343.488633,183.215379,100.399487,45.548715,3.519991,7.165139,3.639921,18.269805
3242,Australia,AUS,2000,349.635487,185.781229,102.337360,46.593883,3.620856,7.952858,3.349301,18.384487
3243,Australia,AUS,2001,357.132367,190.066589,102.582541,49.310577,3.540903,8.172284,3.459474,18.554121
3244,Australia,AUS,2002,361.540810,191.131030,104.516285,50.573495,3.487738,8.007496,3.824767,18.563550
3245,Australia,AUS,2003,369.279512,195.370417,109.570680,49.701209,3.584016,7.157978,3.895211,18.746105
...,...,...,...,...,...,...,...,...,...,...,...
63099,Global,WLD,2017,36096.739276,14506.973805,12242.627935,7144.928128,1507.923185,391.992176,302.294047,4.749682
63100,Global,WLD,2018,36826.506600,14746.830688,12266.016285,7529.846784,1569.218392,412.115746,302.478706,4.792753
63101,Global,WLD,2019,37082.558969,14725.978025,12345.653374,7647.528220,1617.506786,439.253991,306.638573,4.775633
63102,Global,WLD,2020,35264.085734,14174.564010,11191.808551,7556.290283,1637.537532,407.583673,296.301685,4.497423


In [60]:
# CO2 Emissions From Oil and Gas industries greater than 100 Mt



Unnamed: 0,Country,Year,Oil,Gas
9475,Canada,1977,233.700912,101.954464
9476,Canada,1978,233.823426,104.551318
9477,Canada,1979,244.293536,112.572736
9478,Canada,1980,241.294388,112.040529
9479,Canada,1981,227.908128,110.026256
...,...,...,...,...
63099,Global,2017,12242.627935,7144.928128
63100,Global,2018,12266.016285,7529.846784
63101,Global,2019,12345.653374,7647.528220
63102,Global,2020,11191.808551,7556.290283


In [61]:
# CO2 Ecuador's emissions after 2010



Unnamed: 0,Country,ISO 3166-1 alpha-3,Year,Total,Coal,Oil,Gas,Cement,Flaring,Other,Per Capita
16037,Ecuador,ECU,2011,37.398376,0.0,32.877072,1.047904,2.238632,1.234768,,2.454328
16038,Ecuador,ECU,2012,37.408786,0.0,32.1516,1.399648,2.35335,1.504188,,2.415982
16039,Ecuador,ECU,2013,39.654978,0.0,33.961616,1.590176,2.612421,1.490765,,2.522102
16040,Ecuador,ECU,2014,43.73136,0.0,37.61096,1.63048,2.602494,1.887426,,2.740405
16041,Ecuador,ECU,2015,41.275735,0.0,35.442329,1.535077,2.3107,1.98763,,2.54853
16042,Ecuador,ECU,2016,40.177794,0.0,34.29504,1.656128,2.188461,2.038164,,2.443966
16043,Ecuador,ECU,2017,40.075534,0.0,34.284503,1.538736,2.243666,2.00863,,2.400172
16044,Ecuador,ECU,2018,38.245122,0.0,32.983328,1.359344,2.271268,1.631182,,2.247641
16045,Ecuador,ECU,2019,40.264468,0.0,34.85152,1.212896,2.47237,1.727682,,2.321556
16046,Ecuador,ECU,2020,34.457458,0.0,29.006557,1.030269,2.47237,1.948263,,1.95908


## 2.6. Dataframe: Slicing

There are a number of ways to subset the Data Frame:
- One or more columns
- One or more rows
- A subset of rows and columns
- Using iloc an loc methods


Rows and columns can be selected by their position or label

When selecting one column, it is possible to use single set of brackets, but the resulting object will be a Series (not a DataFrame):

In [62]:
# Select one column


0                NaN
1                NaN
2                NaN
3                NaN
4                NaN
            ...     
63099    1507.923185
63100    1569.218392
63101    1617.506786
63102    1637.537532
63103    1672.592372
Name: Cement, Length: 63104, dtype: float64

When we need to select more than one column and/or make the output to be a DataFrame, we should use double brackets:

In [63]:
# Select many columns


Unnamed: 0,Country,Year,Oil
0,Afghanistan,1750,
1,Afghanistan,1751,
2,Afghanistan,1752,
3,Afghanistan,1753,
4,Afghanistan,1754,
...,...,...,...
63099,Global,2017,12242.627935
63100,Global,2018,12266.016285
63101,Global,2019,12345.653374
63102,Global,2020,11191.808551


If we need to select a range of rows, we can specify the range using ":"

In [64]:
# Selecting rows by their position



Unnamed: 0,Country,ISO 3166-1 alpha-3,Year,Total,Coal,Oil,Gas,Cement,Flaring,Other,Per Capita
0,Afghanistan,AFG,1750,0.0,,,,,,,
1,Afghanistan,AFG,1751,0.0,,,,,,,
2,Afghanistan,AFG,1752,0.0,,,,,,,
3,Afghanistan,AFG,1753,0.0,,,,,,,
4,Afghanistan,AFG,1754,0.0,,,,,,,
5,Afghanistan,AFG,1755,0.0,,,,,,,
6,Afghanistan,AFG,1756,0.0,,,,,,,
7,Afghanistan,AFG,1757,0.0,,,,,,,
8,Afghanistan,AFG,1758,0.0,,,,,,,
9,Afghanistan,AFG,1759,0.0,,,,,,,


**Notice that the first row has a position 0, and the last value in the range is omitted: So for 0:20 range the first 20 rows are returned with the positions starting with 0 and ending with 19.**

In [65]:
# Selecting rows and columns using loc method


Unnamed: 0,Country,Year,Oil
0,Afghanistan,1750,
1,Afghanistan,1751,
2,Afghanistan,1752,
3,Afghanistan,1753,
4,Afghanistan,1754,
5,Afghanistan,1755,
6,Afghanistan,1756,
7,Afghanistan,1757,
8,Afghanistan,1758,
9,Afghanistan,1759,


In [66]:
# Selecting rows and columns using iloc method



Unnamed: 0,Country,Year,Oil
0,Afghanistan,1750,
1,Afghanistan,1751,
2,Afghanistan,1752,
3,Afghanistan,1753,
4,Afghanistan,1754,
5,Afghanistan,1755,
6,Afghanistan,1756,
7,Afghanistan,1757,
8,Afghanistan,1758,
9,Afghanistan,1759,


In [67]:
# Other Examples
display(df1.iloc[-1]) # Last row
display(df1.iloc[0:2, :]) # Two row and all columns
display(df1.iloc[:, 0]) # All rows and first column

Country                     Global
ISO 3166-1 alpha-3             WLD
Year                          2021
Total                 37123.850352
Coal                  14979.598083
Oil                   11837.159116
Gas                    7921.829472
Cement                 1672.592372
Flaring                 416.525563
Other                   296.145746
Per Capita                4.693699
Name: 63103, dtype: object

Unnamed: 0,Country,ISO 3166-1 alpha-3,Year,Total,Coal,Oil,Gas,Cement,Flaring,Other,Per Capita
0,Afghanistan,AFG,1750,0.0,,,,,,,
1,Afghanistan,AFG,1751,0.0,,,,,,,


0        Afghanistan
1        Afghanistan
2        Afghanistan
3        Afghanistan
4        Afghanistan
            ...     
63099         Global
63100         Global
63101         Global
63102         Global
63103         Global
Name: Country, Length: 63104, dtype: object

## 2.7. Sorting Dataframes

We can sort the data by a value in the column. By default the sorting will occur in ascending order and a new data frame is return. It is important to use the **sort_values() method.**

In [68]:
# Create a new data frame from the original sorted by a column


Unnamed: 0,Country,ISO 3166-1 alpha-3,Year,Total,Coal,Oil,Gas,Cement,Flaring,Other,Per Capita
0,Afghanistan,AFG,1750,0.0,,,,,,,
42704,Papua New Guinea,PNG,1750,0.0,,,,0.0,,,
42976,Paraguay,PRY,1750,0.0,,,,,,,
43248,Peru,PER,1750,0.0,,,,,,,
43520,Philippines,PHL,1750,0.0,,,,,,,


In [69]:
# Sorting by two or more columns


Unnamed: 0,Country,ISO 3166-1 alpha-3,Year,Total,Coal,Oil,Gas,Cement,Flaring,Other,Per Capita
0,Afghanistan,AFG,1750,0.000000,,,,,,,
1,Afghanistan,AFG,1751,0.000000,,,,,,,
2,Afghanistan,AFG,1752,0.000000,,,,,,,
3,Afghanistan,AFG,1753,0.000000,,,,,,,
4,Afghanistan,AFG,1754,0.000000,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...
62555,Zimbabwe,ZWE,2017,9.596071,5.900452,3.226752,0.0,0.468867,0.0,,0.650533
62556,Zimbabwe,ZWE,2018,11.795478,7.177776,4.059712,0.0,0.557990,0.0,,0.783639
62557,Zimbabwe,ZWE,2019,11.114607,6.888320,3.656672,0.0,0.569615,0.0,,0.723861
62558,Zimbabwe,ZWE,2020,10.607897,6.721571,3.316712,0.0,0.569615,0.0,,0.676970


## 2.8. Handling Missing Values

Missing values are marked as NaN. There are number of methods to deal with missing values in the dataframe.

- If all values are missing, the sum will be equal to NaN
- Many descriptive statistics methods have skipnaoption to control if missing

### 2.8.1. Summary

In [70]:
# Show columns that have missing values


Country                   0
ISO 3166-1 alpha-3     1632
Year                      0
Total                   200
Coal                  41360
Oil                   41387
Gas                   41486
Cement                42290
Flaring               41554
Other                 61484
Per Capita            44130
dtype: int64

### 2.8.2. Methods to deal with missing values

![img2](Recursos/missing _val_methods.PNG)

In [71]:
# Example with dropna()


Unnamed: 0,Country,ISO 3166-1 alpha-3,Year,Total,Coal,Oil,Gas,Cement,Flaring,Other,Per Capita
3232,Australia,AUS,1990,278.154156,141.879819,88.842090,34.454816,3.462872,7.272496,2.242063,16.315938
3233,Australia,AUS,1991,279.528510,146.082840,88.245572,32.786243,3.183033,7.001201,2.229622,16.184767
3234,Australia,AUS,1992,284.525345,150.051381,87.916828,33.970472,2.923411,7.303701,2.359551,16.293502
3235,Australia,AUS,1993,288.870537,150.098575,90.386578,35.670002,3.004698,7.136743,2.573941,16.383765
3236,Australia,AUS,1994,293.696553,151.376241,91.924087,37.032005,3.484276,6.880148,2.999795,16.494706
...,...,...,...,...,...,...,...,...,...,...,...
63099,Global,WLD,2017,36096.739276,14506.973805,12242.627935,7144.928128,1507.923185,391.992176,302.294047,4.749682
63100,Global,WLD,2018,36826.506600,14746.830688,12266.016285,7529.846784,1569.218392,412.115746,302.478706,4.792753
63101,Global,WLD,2019,37082.558969,14725.978025,12345.653374,7647.528220,1617.506786,439.253991,306.638573,4.775633
63102,Global,WLD,2020,35264.085734,14174.564010,11191.808551,7556.290283,1637.537532,407.583673,296.301685,4.497423


In [72]:
# Example with fillna()


Unnamed: 0,Country,ISO 3166-1 alpha-3,Year,Total,Coal,Oil,Gas,Cement,Flaring,Other,Per Capita
0,Afghanistan,AFG,1750,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
1,Afghanistan,AFG,1751,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
2,Afghanistan,AFG,1752,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
3,Afghanistan,AFG,1753,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
4,Afghanistan,AFG,1754,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
...,...,...,...,...,...,...,...,...,...,...,...
63099,Global,WLD,2017,36096.739276,14506.973805,12242.627935,7144.928128,1507.923185,391.992176,302.294047,4.749682
63100,Global,WLD,2018,36826.506600,14746.830688,12266.016285,7529.846784,1569.218392,412.115746,302.478706,4.792753
63101,Global,WLD,2019,37082.558969,14725.978025,12345.653374,7647.528220,1617.506786,439.253991,306.638573,4.775633
63102,Global,WLD,2020,35264.085734,14174.564010,11191.808551,7556.290283,1637.537532,407.583673,296.301685,4.497423


In [73]:
# Fill null values with ffill method



Unnamed: 0,Country,ISO 3166-1 alpha-3,Year,Total,Coal,Oil,Gas,Cement,Flaring,Other,Per Capita
0,Afghanistan,AFG,1750,0.000000,,,,,,,
1,Afghanistan,AFG,1751,0.000000,,,,,,,
2,Afghanistan,AFG,1752,0.000000,,,,,,,
3,Afghanistan,AFG,1753,0.000000,,,,,,,
4,Afghanistan,AFG,1754,0.000000,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...
63099,Global,WLD,2017,36096.739276,14506.973805,12242.627935,7144.928128,1507.923185,391.992176,302.294047,4.749682
63100,Global,WLD,2018,36826.506600,14746.830688,12266.016285,7529.846784,1569.218392,412.115746,302.478706,4.792753
63101,Global,WLD,2019,37082.558969,14725.978025,12345.653374,7647.528220,1617.506786,439.253991,306.638573,4.775633
63102,Global,WLD,2020,35264.085734,14174.564010,11191.808551,7556.290283,1637.537532,407.583673,296.301685,4.497423


In [74]:
# Fill missing values with mean column value



Unnamed: 0,Country,ISO 3166-1 alpha-3,Year,Total,Coal,Oil,Gas,Cement,Flaring,Other,Per Capita
0,Afghanistan,AFG,1750,0.000000,73.968916,55.760624,23.504285,4.330443,1.712695,10.951389,4.413363
1,Afghanistan,AFG,1751,0.000000,73.968916,55.760624,23.504285,4.330443,1.712695,10.951389,4.413363
2,Afghanistan,AFG,1752,0.000000,73.968916,55.760624,23.504285,4.330443,1.712695,10.951389,4.413363
3,Afghanistan,AFG,1753,0.000000,73.968916,55.760624,23.504285,4.330443,1.712695,10.951389,4.413363
4,Afghanistan,AFG,1754,0.000000,73.968916,55.760624,23.504285,4.330443,1.712695,10.951389,4.413363
...,...,...,...,...,...,...,...,...,...,...,...
63099,Global,WLD,2017,36096.739276,14506.973805,12242.627935,7144.928128,1507.923185,391.992176,302.294047,4.749682
63100,Global,WLD,2018,36826.506600,14746.830688,12266.016285,7529.846784,1569.218392,412.115746,302.478706,4.792753
63101,Global,WLD,2019,37082.558969,14725.978025,12345.653374,7647.528220,1617.506786,439.253991,306.638573,4.775633
63102,Global,WLD,2020,35264.085734,14174.564010,11191.808551,7556.290283,1637.537532,407.583673,296.301685,4.497423


## 2.9. Merge Dataframes

In [75]:
# Create new dataframes

prof_p = pd.DataFrame({"id":["E90", "E87"], "Name": ["Jorge", "Freddy"]})
display(prof_p)

prof_c = pd.DataFrame({"id":["E22", "E74", "E90"], "Name": ["Andrés", "Nadia", "Jorge"]})
display(prof_c)

address = pd.DataFrame({"id":["E87", "E22", "E49"],
                        "City":["Guayaquil", "Quito", "Cuenca"],
                        "Province":["Guayas", "Pichincha", "Azuay"]})
display(address)


Unnamed: 0,id,Name
0,E90,Jorge
1,E87,Freddy


Unnamed: 0,id,Name
0,E22,Andrés
1,E74,Nadia
2,E90,Jorge


Unnamed: 0,id,City,Province
0,E87,Guayaquil,Guayas
1,E22,Quito,Pichincha
2,E49,Cuenca,Azuay


In [76]:
# Concat dataframes using concat



Unnamed: 0,id,Name
0,E90,Jorge
1,E87,Freddy
0,E22,Andrés
1,E74,Nadia
2,E90,Jorge


In [77]:
# Concat dataframes using concat and ignoring index



Unnamed: 0,id,Name
0,E90,Jorge
1,E87,Freddy
2,E22,Andrés
3,E74,Nadia
4,E90,Jorge


In [78]:
# Check for duplicates after concatenation



Unnamed: 0,id,Name
4,E90,Jorge


In [79]:
# Remove duplicates


Unnamed: 0,id,Name
1,E87,Freddy
2,E22,Andrés
3,E74,Nadia
4,E90,Jorge


In [80]:
# Order index



Unnamed: 0,id,Name
0,E87,Freddy
1,E22,Andrés
2,E74,Nadia
3,E90,Jorge


In [81]:
# show professor and address dataframes



Unnamed: 0,id,Name
0,E87,Freddy
1,E22,Andrés
2,E74,Nadia
3,E90,Jorge


Unnamed: 0,id,City,Province
0,E87,Guayaquil,Guayas
1,E22,Quito,Pichincha
2,E49,Cuenca,Azuay


### 2.9.1. Inner join

In [82]:
# Merge dataframes (professors, address) by Inner Join

r

Unnamed: 0,id,Name,City,Province
0,E87,Freddy,Guayaquil,Guayas
1,E22,Andrés,Quito,Pichincha



### 2.9.2. Left Outer Join

In [83]:
# Merge dataframes (professor, address) by Left Outer Join



Unnamed: 0,id,Name,City,Province
0,E87,Freddy,Guayaquil,Guayas
1,E22,Andrés,Quito,Pichincha
2,E74,Nadia,,
3,E90,Jorge,,



### 2.9.3. Right Outer Join

In [84]:
# Merge dataframes (professor, address) by Right Outer Join



Unnamed: 0,id,Name,City,Province
0,E87,Freddy,Guayaquil,Guayas
1,E22,Andrés,Quito,Pichincha
2,E49,,Cuenca,Azuay


### 2.9.4. Full Outer Join

In [85]:
# Merge dataframes by Full Outer Join



Unnamed: 0,id,Name,City,Province
0,E87,Freddy,Guayaquil,Guayas
1,E22,Andrés,Quito,Pichincha
2,E74,Nadia,,
3,E90,Jorge,,
4,E49,,Cuenca,Azuay


## Test Part 2

Analyse and clean the following dataset by doing the tasks below:

- Import and read the excel file Capacitación FICT ABRIL - 2023.
- Show the summary of this dataset.
- Rename some columns.
- Drop unnecessary columns.
- Change "x" to registered and null values to unregistered.
- Show the amount of people registered in at least two courses (*use value_counts method*).
- Convert to only one column all the courses (*use melt method*).
- Show people registered in GIS Course.
- Save the final dataframe as a csv file named FICT_Courses into the Día2/Datos directory

In [86]:
# Import data from FICT Courses



Unnamed: 0,Marca temporal,Dirección de correo electrónico,Nombres Completos,Capacitaciones,AWC,Bienestar Psicológico,Herramientas para Generar Bienestar,Python,Unnamed: 8,Unnamed: 9
0,2023-03-31 13:51:37.715,cgoyburo@espol.edu.ec,Cindy Samanda Goyburo Chávez,x,x,x,x,x,,
1,2023-03-31 13:55:33.574,dvelaste@espol.edu.ec,Andrés Velástegui Montoya,,x,,,x,,
2,2023-03-31 13:55:34.843,dpaz@espol.edu.ec,Daniela Margarita Paz Barzola,x,x,x,x,x,,
3,2023-03-31 13:58:01.143,nlagasca@espol.edu.ec,Nadia José Lagasca Loaiza,x,x,x,x,x,,
4,2023-03-31 13:58:17.832,fetorres@espol.edu.ec,Federico Ricardo Torres Negrete,x,,x,x,x,,
5,2023-03-31 13:58:24.426,mmulas@espol.edu.ec,Maurizio Mulas,,x,x,x,,,
6,2023-03-31 14:00:45.818,ogarces@espol.edu.ec,Daniel Omar Garces Leon,,,x,,x,,
7,2023-03-31 14:03:05.350,pestolay@espol.edu.ec,Peter Olaya Carbo,,x,x,x,,,
8,2023-03-31 14:03:53.063,fpcarrio@espol.edu.ec,Freddy Paul Carrión Maldonado,x,x,x,x,,,
9,2023-03-31 14:09:00.103,angnmedi@espol.edu.ec,Angie Nicole Medina Toala,x,x,x,x,x,,


In [87]:
# Data summary


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 52 entries, 0 to 51
Data columns (total 10 columns):
 #   Column                               Non-Null Count  Dtype         
---  ------                               --------------  -----         
 0   Marca temporal                       52 non-null     datetime64[ns]
 1   Dirección de correo electrónico      52 non-null     object        
 2   Nombres Completos                    52 non-null     object        
 3   Capacitaciones                       19 non-null     object        
 4   AWC                                  27 non-null     object        
 5   Bienestar Psicológico                24 non-null     object        
 6   Herramientas para Generar Bienestar  27 non-null     object        
 7   Python                               32 non-null     object        
 8   Unnamed: 8                           0 non-null      float64       
 9   Unnamed: 9                           0 non-null      float64       
dtypes: datetime64[ns

In [88]:
# Rename columns


Unnamed: 0,Marca temporal,Correo,Nombres,GIS,AWC,Bienestar Psicológico,Herramientas para Generar Bienestar,Python,Unnamed: 8,Unnamed: 9
0,2023-03-31 13:51:37.715,cgoyburo@espol.edu.ec,Cindy Samanda Goyburo Chávez,x,x,x,x,x,,
1,2023-03-31 13:55:33.574,dvelaste@espol.edu.ec,Andrés Velástegui Montoya,,x,,,x,,
2,2023-03-31 13:55:34.843,dpaz@espol.edu.ec,Daniela Margarita Paz Barzola,x,x,x,x,x,,
3,2023-03-31 13:58:01.143,nlagasca@espol.edu.ec,Nadia José Lagasca Loaiza,x,x,x,x,x,,
4,2023-03-31 13:58:17.832,fetorres@espol.edu.ec,Federico Ricardo Torres Negrete,x,,x,x,x,,
5,2023-03-31 13:58:24.426,mmulas@espol.edu.ec,Maurizio Mulas,,x,x,x,,,
6,2023-03-31 14:00:45.818,ogarces@espol.edu.ec,Daniel Omar Garces Leon,,,x,,x,,
7,2023-03-31 14:03:05.350,pestolay@espol.edu.ec,Peter Olaya Carbo,,x,x,x,,,
8,2023-03-31 14:03:53.063,fpcarrio@espol.edu.ec,Freddy Paul Carrión Maldonado,x,x,x,x,,,
9,2023-03-31 14:09:00.103,angnmedi@espol.edu.ec,Angie Nicole Medina Toala,x,x,x,x,x,,


In [89]:
# Drop unnecessary columns



Unnamed: 0,Correo,Nombres,GIS,AWC,Bienestar Psicológico,Herramientas para Generar Bienestar,Python
0,cgoyburo@espol.edu.ec,Cindy Samanda Goyburo Chávez,x,x,x,x,x
1,dvelaste@espol.edu.ec,Andrés Velástegui Montoya,,x,,,x
2,dpaz@espol.edu.ec,Daniela Margarita Paz Barzola,x,x,x,x,x
3,nlagasca@espol.edu.ec,Nadia José Lagasca Loaiza,x,x,x,x,x
4,fetorres@espol.edu.ec,Federico Ricardo Torres Negrete,x,,x,x,x
5,mmulas@espol.edu.ec,Maurizio Mulas,,x,x,x,
6,ogarces@espol.edu.ec,Daniel Omar Garces Leon,,,x,,x
7,pestolay@espol.edu.ec,Peter Olaya Carbo,,x,x,x,
8,fpcarrio@espol.edu.ec,Freddy Paul Carrión Maldonado,x,x,x,x,
9,angnmedi@espol.edu.ec,Angie Nicole Medina Toala,x,x,x,x,x


In [90]:
# Change x values to registered



Unnamed: 0,Correo,Nombres,GIS,AWC,Bienestar Psicológico,Herramientas para Generar Bienestar,Python
0,cgoyburo@espol.edu.ec,Cindy Samanda Goyburo Chávez,registered,registered,registered,registered,registered
1,dvelaste@espol.edu.ec,Andrés Velástegui Montoya,unregistered,registered,unregistered,unregistered,registered
2,dpaz@espol.edu.ec,Daniela Margarita Paz Barzola,registered,registered,registered,registered,registered
3,nlagasca@espol.edu.ec,Nadia José Lagasca Loaiza,registered,registered,registered,registered,registered
4,fetorres@espol.edu.ec,Federico Ricardo Torres Negrete,registered,unregistered,registered,registered,registered
5,mmulas@espol.edu.ec,Maurizio Mulas,unregistered,registered,registered,registered,unregistered
6,ogarces@espol.edu.ec,Daniel Omar Garces Leon,unregistered,unregistered,registered,unregistered,registered
7,pestolay@espol.edu.ec,Peter Olaya Carbo,unregistered,registered,registered,registered,unregistered
8,fpcarrio@espol.edu.ec,Freddy Paul Carrión Maldonado,registered,registered,registered,registered,unregistered
9,angnmedi@espol.edu.ec,Angie Nicole Medina Toala,registered,registered,registered,registered,registered


In [91]:
# Show results from GIS Course


unregistered    33
registered      19
Name: GIS, dtype: int64

In [92]:
# Show results from Python Course


registered      32
unregistered    20
Name: Python, dtype: int64

In [93]:
# Convert to only one column all the courses



Unnamed: 0,Nombres,Course,Answer
0,Cindy Samanda Goyburo Chávez,GIS,registered
1,Andrés Velástegui Montoya,GIS,unregistered
2,Daniela Margarita Paz Barzola,GIS,registered
3,Nadia José Lagasca Loaiza,GIS,registered
4,Federico Ricardo Torres Negrete,GIS,registered
...,...,...,...
255,María Cecibel Castillo Olvera,Python,registered
256,Mónica López Moncada,Python,unregistered
257,Katherine Tatiana Escobar Pesantes,Python,unregistered
258,CRISTIAN ALFONSO SALAS VÁZQUEZ,Python,registered


In [94]:
# People registered in GIS Course


Unnamed: 0,Nombres,Course,Answer
0,Cindy Samanda Goyburo Chávez,GIS,registered
2,Daniela Margarita Paz Barzola,GIS,registered
3,Nadia José Lagasca Loaiza,GIS,registered
4,Federico Ricardo Torres Negrete,GIS,registered
8,Freddy Paul Carrión Maldonado,GIS,registered
9,Angie Nicole Medina Toala,GIS,registered
13,Karla Maytee Villamar Marazita,GIS,registered
16,Jairo Dueñas Tovar,GIS,registered
19,Gina Alejandra Peña Villacreses,GIS,registered
21,"Paulina Elizabeth Vilela Govea, Ph. D.",GIS,registered


In [95]:
# Save dataframe as a csv file

