<a href="https://colab.research.google.com/github/NicoPatalagua/Pandas/blob/master/PandasExercises.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Pandas Exercises**
## *Patalagua Suárez Nicolás*
### Universidad Sergio Arboleda

### ***What is Pandas?***

In Computing and Data Science, pandas is a software library written as an extension to NumPy for data manipulation and analysis for the Python programming language. In particular, it offers data structures and operations for number tables and time series. It is free software distributed under the BSD version three clauses license.

 https://pandas.pydata.org/

### ***What is Numpy?***

NumPy is a Python extension, which adds more support for vectors and matrices, constituting a library of high-level mathematical functions to operate with those vectors or matrices. NumPy's ancestor Numeric was originally created by Jim Hugunin with some contributions from other developers. In 2005 Travis Oliphant created NumPy incorporating Numarray features into NumPy with some modifications.


 https://numpy.org/

### ***Repository***
This work was done taking into account the repository https://github.com/guipsamora/pandas_exercises, developed by Guilherme Samora, who is a Senior Product Manager at the Global Savings Group in Munich - Germany.

## ***Getting and knowing***

### ***Chiotle***

Dataset and materials: https://github.com/justmarkham 

**Step 1.** *Import the necessary libraries*

In [0]:
#Import the pandas library and assign it to the variable pd
import pandas as pd
#Import the numpy library and assign it to the variable np
import numpy as np

**Step 2.** *Import the dataset from this address.*


In [0]:
#Assign the dataset variable the path of the repository where the file to be used is
dataset = 'https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv'

**Step 3.** *Assign it to a variable called chipo.*

In [0]:
#Create the variable chipo
chipo = pd.read_csv(dataset, sep = '\t')

**Step 4.** *See the first 10 entries.*

In [177]:
#With head we show a specific amount of data
chipo.head(10)

Unnamed: 0,order_id,quantity,item_name,choice_description,item_price
0,1,1,Chips and Fresh Tomato Salsa,,$2.39
1,1,1,Izze,[Clementine],$3.39
2,1,1,Nantucket Nectar,[Apple],$3.39
3,1,1,Chips and Tomatillo-Green Chili Salsa,,$2.39
4,2,2,Chicken Bowl,"[Tomatillo-Red Chili Salsa (Hot), [Black Beans...",$16.98
5,3,1,Chicken Bowl,"[Fresh Tomato Salsa (Mild), [Rice, Cheese, Sou...",$10.98
6,3,1,Side of Chips,,$1.69
7,4,1,Steak Burrito,"[Tomatillo Red Chili Salsa, [Fajita Vegetables...",$11.75
8,4,1,Steak Soft Tacos,"[Tomatillo Green Chili Salsa, [Pinto Beans, Ch...",$9.25
9,5,1,Steak Burrito,"[Fresh Tomato Salsa, [Rice, Black Beans, Pinto...",$9.25


**Step 5.** *What is the number of observations in the dataset?*

In [0]:
#With shape with a 0 we can show the number of rows in the file
chipo.shape[0]

4622

**Step 6.** *What is the number of columns in the dataset?*

In [0]:
#With shape with a 1 we can show the number of columns in the file
chipo.shape[1]

5

**Step 7.** *Print the name of all the columns.*

In [0]:
#If we add columns to the variable that reads the file, it prints the name of the columns
chipo.columns

Index(['order_id', 'quantity', 'item_name', 'choice_description',
       'item_price'],
      dtype='object')

**Step 8.** *How is the dataset indexed?*

In [0]:
#The index method returns the number of indexes of the variable
chipo.index

RangeIndex(start=0, stop=4622, step=1)

**Step 9.** *Which was the most-ordered item?*

In [0]:
#Group by items with groupby,Sum the grouped data with the previous method 
#and Order the values ​​by quantity
ObjItem= chipo.groupby('item_name').sum().sort_values(['quantity'], ascending=False)
#Print the Item most-ordered
ObjItem.iloc[0,0:0] 

Series([], Name: Chicken Bowl, dtype: int64)

**Step 10.** *For the most-ordered item, how many items were ordered?*

In [0]:
#Print Quantity of ordered items
ObjItem.iloc[0,1] 

761

**Step 11.** *What was the most ordered item in the choice_description column?*

In [165]:
#Group by items with groupby,Sum the grouped data with the previous method 
#and Order the values ​​by quantity
ObjItem= chipo.groupby('choice_description').sum().sort_values(['quantity'], ascending=False)
#Print the Item most-ordered
ObjItem.iloc[0,0:1] 

order_id    123455
Name: [Diet Coke], dtype: int64

**Step 12.** *How many items were orderd in total?*

In [133]:
#Count the quantity of items ordered with the quantity column
chipo.quantity.sum()

4972

**Step 13.** *Turn the item price into a float.*

> **a.** *Check the item price type.*



In [178]:
#with the dtype method we return the data type of the selected column
chipo.item_price.dtype

dtype('O')



> **b.** *Create a lambda function and change the type of item price.*



In [0]:
#Assign the column itemprice to chipo and later we apply the lambda function
chipo.item_price=chipo.item_price.apply(lambda x: float(x[1:-1]))


>**c.** *Check the item price type.*



In [180]:
#recheck data type
chipo.item_price.dtype

dtype('float64')

**Step 14.** *How much was the revenue for the period in the dataset?*

In [186]:
#We multiply the quantity by the price and add the result
(chipo['quantity']*chipo['item_price']).sum()

39237.02

**Step 15.** *How many orders were made in the period?*

In [197]:
#Count the number of order data with valuecounts
#With count return the length that corresponds to the number of orders
chipo.order_id.value_counts().count()

1834

**Step 16.** *What is the average revenue amount per order?*

In [206]:
#With the group method ordered by order_id, 
#Add the data and get the average (Specify revenue)
chipo.groupby('order_id').sum().mean()['revenue']

21.394231188658654

**Step 17.** *How many different items are sold?*

In [211]:
#Count the number of different items 
#Count the number of items obtained
chipo.item_name.value_counts().count()

50

### ***Occupation***

Dataset and materials: https://github.com/justmarkham

**Step 1.** *Import the necessary libraries.*

**Step 2.** *Import the dataset from this address.*

**Step 3.** *Assign it to a variable called users and use the 'user_id' as index.*

**Step 4.** *See the first 25 entries.*

**Step 5.** *See the last 10 entries.*

**Step 6.** *What is the number of observations in the dataset?*

**Step 7.** *What is the number of columns in the dataset?*

**Step 8.** *Print the name of all the columns.*

**Step 9.** *How is the dataset indexed?*

**Step 10.** *What is the data type of each column?*

**Step 11.** *Print only the occupation column.*

**Step 12.** *How many different occupations are in this dataset?*

**Step 13.** *What is the most frequent occupation?*

**Step 14.** *Summarize the DataFrame.*

**Step 15.** *Summarize all the columns.*

**Step 16.** *Summarize only the occupation column.*

**Step 17.** *What is the mean age of users?*

**Step 18.** *What is the age with least occurrence?*

### ***World Food Facts***


**Step 1.** *Go to https://www.kaggle.com/openfoodfacts/world-food-facts/data*

**Step 2.** *Download the dataset to your computer and unzip it.*

**Step 3.** *Use the tsv file and assign it to a dataframe called food.*

**Step 4.** *See the first 5 entries.*

**Step 5.** *What is the number of observations in the dataset?*

**Step 6.** *What is the number of columns in the dataset?*

**Step 7.** *Print the name of all the columns.*

**Step 8.** *What is the name of 105th column?*


**Step 9.** *What is the type of the observations of the 105th column?*

**Step 10.** *How is the dataset indexed?*

**Step 11.** *What is the product name of the 19th observation?*

## ***Filtering and Sorting***

## ***Grouping***

## ***Apply***

## ***Merge***

## ***Stats***


## ***Visualization***

## ***Creating Series and DataFrames***

## ***Time Series***

## ***Deleting***