## Python data visualization

When working with data, it can be difficult to truly understand your data in a tabular format.
Often we need to visualize it or represent it in an image. This helps expose patterns
, correlations, and trends that cannot be obtained when data is in a table.

Data visualization is a field in data analysis that deals with visual representation of data. It graphically plots data and is 
an effective way to communicate inferences from data.

# Matplotlib

A histogram is a graph showing frequency distributions.

It is a graph showing the number of observations within each given interval.

In [None]:
import matplotlib.pyplot as plt
import pandas as pd

In [None]:
df =pd.read_csv('iris_data.txt', sep=",", header=None, names=['Sepal.Length','Sepal.Width' ,'Petal.Length' ,'Petal.Width','Species'])

In [None]:
# Plot a histogram of a single column in the DataFrame
df.hist(column='Sepal.Length')

# Set the title and axis labels
plt.title('Histogram of Length Variable')
plt.xlabel('Values')
plt.ylabel('Frequency')

# Display the histogram
plt.show()

##   boxplot

A box plot is a method for graphically depicting groups of numerical data through their quartiles.
Box plots are a great way to see if there are outliers in your dataset.

In [None]:
boxplot=df.boxplot(column=['Sepal.Length'])
print(df['Sepal.Length'].quantile([0.25,0.5,0.75]))
print(boxplot)

In [None]:
#here we can see outliers in the data
boxplot=df.boxplot(column=['Sepal.Width'])
print(boxplot)

In [None]:
boxplot=df.boxplot()
# Display the boxplot for all variables
plt.show()

# line chart

A line plot is a graphical display that visually represents the correlation between certain variables or 
changes in data over time using several points, 
usually ordered in their x-axis value, that are connected by straight line segments. 
Great for time series


In [None]:
# Create a list of data to be represented in x-axis 
days = [ 'Saturday' , 'Sunday' , 'Monday' , 'Tuesday' , 
        'Wednesday' , 'Thursday' , 'Friday' ] 
  
# Create a list of data to be  
# represented in y-axis 
calories = [ 1670 , 2011 , 1853 , 2557 , 
            1390 , 2118 , 2063 ] 
stress = [ 900 , 300 , 500 , 100 , 800 , 500 , 100 ]  


# Create a dataframe using the two lists 
df_days_calories = pd.DataFrame( 
    { 'day' : days , 'calories' : calories, 'stress':stress }) 

In [None]:

# use plot() method on the dataframe 
df_days_calories.plot( 'day' , 'calories' ) 

In [None]:
df_days_calories.plot()

In [None]:
ax = plt.gca() 
  
#use plot() method on the dataframe 
plt.title("Analysis")


plt.ylabel("Variables")
df_days_calories.plot( x = 'day' , y = 'stress', ax = ax ) 
df_days_calories.plot( x = 'day' , y = 'calories' , ax = ax ) 

## Scatter plot

A Scatter plot is a type of data visualization technique that shows the relationship between two numerical variables.
For small data sets the output can show trends similar to line chart but with larger data
scatter plots can show clusters or groups better than other visuals which is the strength of this kind of chart.



In [None]:
# Draw a scatter plot 
df_days_calories.plot.scatter(x = 'day', y = 'calories')

In [None]:
#create scatter plot of A_assists vs. A_points
ax1=df_days_calories.plot(kind='scatter', x='day', y='calories', color='r', label='Calories')

#add scatter plot on same graph using B_assists vs. B_points
ax2=df_days_calories.plot(kind='scatter', x='day', y='stress', color='g', label='Stress', ax=ax1)

#specify x-axis and y-axis labels
ax1.set_xlabel('Days')
ax1.set_ylabel('Values')

#give a title
plt.title("Analysis")

## Bar plot

A bar plot or bar chart is a graph that represents the category of data 
with rectangular bars with lengths and heights that is proportional to the
values which they represent. The bar plots can be plotted horizontally or vertically.


In [None]:

df_days_calories.plot(kind="bar")

In [None]:
# Set 'day' as the index
df_days_calories.set_index('day', inplace=True)

In [None]:
df_days_calories.plot(kind="bar")

In [None]:
# Reset the index
df_days_calories.reset_index(inplace=True)

## Seaborn

creating visually appealing and informative statistical graphics. Built on Matplotlib's 
capabilities, enhancing its interface and offering more options for visualizing data, 
especially for statistical analysis. Seaborn's seamless integration with Pandas DataFrames
makes it a favorite among data scientists and analysts.


In [None]:
import seaborn as sns

In [None]:
#histogram  df.hist(column='Sepal.Length')

In [None]:
sns.histplot(data=df, x='Sepal.Length')
plt.title('Distribution of Length')
plt.xlabel('Distribution')
plt.ylabel('Frequency')
plt.grid(True)
plt.show()

In [None]:
#boxplot
sns.boxplot( data=df)

In [None]:
# line chart
sns.lineplot(x='day', y='stress', data=df_days_calories)
plt.title('Analysis')


In [None]:
#add labels

sns.lineplot(x='day', y='stress', data=df_days_calories).set(title='Test data', xlabel='Day of the week', ylabel='Stress Index')

In [None]:
#multiple lines


sns.lineplot(x='day', y='stress', data=df_days_calories).set(title='Test data', xlabel='Day of the week', ylabel='Stress Index')
sns.lineplot(x='day', y='calories', data=df_days_calories)
plt.legend(labels = ["stress", "calories"], title = "Variables")
 
plt.show()

In [None]:
## Scatter plot
sns.scatterplot(x="day",
                    y="calories",
                    data=df_days_calories)

In [None]:
sns.scatterplot(x="day",
                    y="calories",
                    data=df_days_calories)

sns.scatterplot(x="day",
                    y="stress",
                    data=df_days_calories)

In [None]:
## Bar plot

In [None]:
sns.barplot(x = 'day',y = 'stress',data = df_days_calories)
plt.show()

# Facet Grid

This class maps a dataset onto multiple axes arrayed in a grid of rows and columns that correspond to levels of variables in the dataset.  This is great when doing exploritory data analysis of variables used for modeling. 
More on that later. 

In [None]:
df.head()

In [None]:
g = sns.FacetGrid(df, col="Species")
g.map(sns.histplot, "Sepal.Width")

In [None]:
g = sns.FacetGrid(df, col="Species")
g.map(sns.scatterplot, "Sepal.Width", "Sepal.Length", alpha=.7)
g.add_legend()