# Visualización de datos en Python
#### Tablas, histogramas, diagramas de caja y corte para estadísticas

Cuando se trabaja con un nuevo conjunto de datos, una de las cosas más útiles es comenzar a visualizar los datos. Mediante el uso de tablas, histogramas, diagramas de caja y otras herramientas visuales, podemos tener una mejor idea de lo que los datos pueden estar tratando de decirnos y podemos obtener información sobre los datos que quizás no hubiéramos descubierto de otra manera.

Repasaremos cómo realizar algunas visualizaciones básicas en Python y, lo más importante, aprenderemos cómo comenzar a explorar datos desde una perspectiva gráfica.

In [None]:
# We first need to import the packages that we will be using
import seaborn as sns # For plotting
import pandas as pd
import matplotlib.pyplot as plt # For showing plots

# Load in the data set
tips_data = pd.read_csv("Datasets/tips.csv")


#### Visualizing the Data - Tables
When you begin working with a new data set,  it is often best to print out the first few rows before you begin other analysis. This will show you what kind of data is in the dataset, what data types you are working with, and will serve as a reference for the other plots that we are about to make. 

In [None]:
# Print out the first few rows of the data
tips_data.head()

#### Describing Data
Summary statistics, which include things like the mean, min, and max of the data, can be useful to get a feel for how large some of the variables are and what variables may be the most important. 

In [None]:
# Print out the summary statistics for the quantitative variables
tips_data.describe()

In [None]:
desc = tips_data['size'].describe()
desc

In [None]:
tips_data['total_bill'].describe()

In [None]:
just_bills = ((tips_data['size']>3) &(tips_data['size']<7))
just_bills 

In [None]:
tips_data1 = tips_data[just_bills] 
tips_data1

In [None]:
tips_data1['size'].unique()

In [None]:
tips_data1['size'].value_counts()


In [None]:
# Plot a histogram of the total bill
sns.distplot(tips_data1["size"], kde = False).set_title("Histogram of size")
plt.show()

In [None]:
tips_data2 = tips_data.loc[just_bills, 'size'] 
tips_data2

In [None]:
tips_data3 = tips_data.loc[just_bills, ['size', 'total_bill']] 
tips_data3

In [None]:
#tips_data1['day']
tips_data1

In [None]:
prefered_days = ['Sun','Sat' ]
filt = tips_data1['day'].isin(prefered_days)

In [None]:
tips_data1.loc[filt]

In [None]:
tips_data1.loc[filt, 'size']

In [None]:
print('='*30,'Good luck!', '='*30)