# Learn Data Visualisation with me!!!
*Hey begineer, do you want to learn data visualization with me? Yes, lets go!!!
I am using **titanic dataset** to help you understand various data visualization techniques in data analytics*

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

**Importing libraries**
These libraries are the most commonly used libraries in python for data visualization

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

**Read the data**

In [None]:
train=pd.read_csv('/kaggle/input/titanic/train.csv')

## 1. Bar Plot
This bar plot shows the survival rate of passengers in each class (1st, 2nd, 3rd) on the Titanic. It helps us understand if there was a significant difference in survival rates between different passenger classes.

In [None]:
plt.figure(figsize=(8, 5))
train.groupby('Pclass')['Survived'].mean().plot(kind='bar')
plt.title('Survival Rate by Passenger Class')
plt.xlabel('Passenger Class')
plt.ylabel('Survival Rate')
plt.show()

## 2. Histogram
This histogram displays the age distribution of passengers on the Titanic. It helps us visualize the age demographics of passengers and identify any trends in the age distribution.

In [None]:
plt.figure(figsize=(8, 5))
sns.histplot(data=train, x='Age', bins=20, kde=True)
plt.title('Age Distribution of Passengers')
plt.xlabel('Age')
plt.ylabel('Count')
plt.show()

## 3. Box Plot
This box plot illustrates the distribution of fares paid by passengers in different classes. It allows us to compare the fare distributions among the three passenger classes.

In [None]:
plt.figure(figsize=(8, 5))
sns.boxplot(data=train, x='Pclass', y='Fare')
plt.title('Fare Distribution by Passenger Class')
plt.xlabel('Passenger Class')
plt.ylabel('Fare')
plt.show()

## 4. Scatter Plot
This scatter plot helps us explore the relationship between the age of passengers and the fare they paid. It can reveal patterns, such as whether older passengers tend to pay more for their tickets.

In [None]:
plt.figure(figsize=(8, 5))
plt.scatter(train['Age'], train['Fare'])
plt.title('Fare vs. Age')
plt.xlabel('Age')
plt.ylabel('Fare')
plt.show()

## 5. Interactive Scatter Plot
This interactive scatter plot provides the same information as the previous scatter plot but with the added interactivity of hovering over data points to see additional details.

In [None]:
fig = px.scatter(train, x='Age', y='Fare', color='Survived', hover_name='Name')
fig.update_layout(title='Fare vs. Age (Interactive)', xaxis_title='Age', yaxis_title='Fare')
fig.show()

## 6. Count Plot
This count plot shows the number of passengers who survived and did not survive, categorized by gender. It helps us understand the survival distribution among males and females.

In [None]:
plt.figure(figsize=(8, 5))
sns.countplot(data=train, x='Sex', hue='Survived')
plt.title('Survival Count by Gender')
plt.xlabel('Gender')
plt.ylabel('Count')
plt.legend(title='Survived', loc='upper right', labels=['No', 'Yes'])
plt.show()

## 7. Pair Plot
A pair plot is useful for visualizing relationships between numerical variables and identifying potential correlations. In this example, we're exploring how factors like age, fare, and family size relate to survival.

In [None]:
sns.pairplot(train, hue='Survived')
plt.title('Pairwise Relationships')
plt.show()

## 8. Heatmap
A heatmap displays the correlation between different numerical variables. In this case, it helps us identify which variables are strongly correlated and provides insights into potential multicollinearity.

In [None]:
plt.figure(figsize=(8, 6))
correlation_matrix = train.corr()
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm')
plt.title('Correlation Heatmap')
plt.show()

## 9. Pie Chart
A pie chart shows the distribution of passengers among different classes. It gives a clear visual representation of the proportion of passengers in each class.

In [None]:
plt.figure(figsize=(8, 5))
class_counts = train['Pclass'].value_counts()
plt.pie(class_counts, labels=class_counts.index, autopct='%1.1f%%', startangle=90)
plt.title('Passenger Class Distribution')
plt.axis('equal')
plt.show()

## 10. Interactive Bar Chart
This interactive bar chart displays the count of passengers who embarked from different points (S, C, Q) and color-codes them based on survival status. It allows you to interactively explore the data.

In [None]:
fig = px.bar(train, x='Embarked', color='Survived', title='Passenger Count by Embarkation Point',
             labels={'Embarked': 'Embarkation Point', 'count': 'Count'})
fig.show()

## 11. Violin Plot
A violin plot combines a box plot and a kernel density estimation to provide a richer view of the data distribution. In this example, it shows the age distribution by passenger class and survival status, allowing you to see both the central tendency and the spread of the data.

In [None]:
plt.figure(figsize=(10, 6))
sns.violinplot(data=train, x='Pclass', y='Age', hue='Survived', split=True)
plt.title('Age Distribution by Passenger Class and Survival')
plt.xlabel('Passenger Class')
plt.ylabel('Age')
plt.legend(title='Survived', labels=['No', 'Yes'])
plt.show()

## 12. Stacked Bar Chart
This stacked bar chart displays survival rates by passenger class and gender. It allows you to compare the survival rates of male and female passengers within each class.

In [None]:
survival_by_class_gender = train.groupby(['Pclass', 'Sex'])['Survived'].mean().unstack()
plt.figure(figsize=(10, 6))
survival_by_class_gender.plot(kind='bar', stacked=True)
plt.title('Survival Rate by Passenger Class and Gender')
plt.xlabel('Passenger Class')
plt.ylabel('Survival Rate')
plt.legend(title='Gender', loc='upper left')
plt.show()

## 13. Line Plot
If we have a timestamp column, a line plot can be used to visualize changes in survival rates over time. This can be useful for understanding how survival rates evolved during the Titanic's voyage. Here we have created a line plot showing how survival rates change with age. 

In [None]:
plt.figure(figsize=(10, 6))
train.groupby('Age')['Survived'].mean().plot()
plt.title('Survival Rate by Age')
plt.xlabel('Age')
plt.ylabel('Survival Rate')
plt.show()

## 14. 3D Scatter Plot
 A 3D scatter plot allows you to visualize the relationships between three variables: age, fare, and survival. It can provide insights into how these variables interact with each other. You can zoom in the plot to see it clearly.

In [None]:
fig = px.scatter_3d(train, x='Age', y='Fare', z='Survived', color='Survived')
fig.update_layout(title='3D Scatter Plot of Age, Fare, and Survival', scene=dict(zaxis_title='Survived'))
fig.show()

## 15. PairGrid
A PairGrid is a versatile tool for exploring relationships between multiple variables. In this example, it combines scatter plots, histograms, and kernel density plots to provide a comprehensive view of the data.

In [None]:
g = sns.PairGrid(train, hue='Survived')
g.map_upper(sns.scatterplot)
g.map_diag(sns.histplot)
g.map_lower(sns.kdeplot)
g.add_legend()
g.fig.suptitle('PairGrid for Detailed Exploration')
plt.show()

## 16. Pairwise Correlation Heatmap
This heatmap shows the pairwise correlation between numerical features (age, fare, siblings/spouses, parents/children). It helps identify relationships and potential multicollinearity between these variables.

In [None]:
numerical_features = train[['Age', 'Fare', 'SibSp', 'Parch']]
correlation_matrix = numerical_features.corr()
plt.figure(figsize=(8, 6))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', square=True)
plt.title('Pairwise Correlation Heatmap for Numerical Features')
plt.show()

## 17. Swarm Plot
A swarm plot displays individual data points along with their distributions. This example visualizes the age distribution by passenger class and gender, providing a detailed view of the data.

In [None]:
plt.figure(figsize=(10, 6))
sns.swarmplot(data=train, x='Pclass', y='Age', hue='Sex', dodge=True)
plt.title('Age Distribution by Passenger Class and Gender')
plt.xlabel('Passenger Class')
plt.ylabel('Age')
plt.legend(title='Sex', loc='upper right')
plt.show()

## 18. KDE (Kernel Density Estimation) Plot
A KDE plot visualizes the probability density of a continuous variable. In this example, it shows the distribution of fares for each passenger class, allowing you to see how fare values vary by class.

In [None]:
plt.figure(figsize=(10, 6))
sns.kdeplot(data=train, x='Fare', hue='Pclass', fill=True)
plt.title('Fare Distribution by Passenger Class')
plt.xlabel('Fare')
plt.ylabel('Density')
plt.legend(title='Passenger Class')
plt.show()

## 19. Radar Chart 
A radar chart is useful for comparing multiple attributes across different categories. In this example, it compares the average values of attributes (e.g., age, fare) for each passenger class.

In [None]:
import plotly.graph_objects as go

attributes = ['Age', 'Fare', 'SibSp', 'Parch']
average_values = train.groupby('Pclass')[attributes].mean().reset_index()

fig = go.Figure()

for index, row in average_values.iterrows():
    fig.add_trace(go.Scatterpolar(
        r=row[attributes].values,
        theta=attributes,
        fill='toself',
        name=f'Class {int(row["Pclass"])}'
    ))

fig.update_layout(
    polar=dict(
        radialaxis=dict(
            visible=True,
        )),
    title='Average Passenger Attributes by Class (Radar Chart)'
)
fig.show()

## 20. Word Cloud for Cabin Names using WordCloud Library
Just for fun. A word cloud visually represents the most frequently occurring words or phrases in a text. In this case, it shows the most common cabin names on the Titanic.

In [None]:
from wordcloud import WordCloud

cabin_text = ' '.join(train['Cabin'].dropna())
wordcloud = WordCloud(width=800, height=400, background_color='white').generate(cabin_text)

plt.figure(figsize=(10, 6))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.title('Word Cloud of Cabin Names')
plt.show()

## 21. Sunburst Chart

In [None]:
fig = px.sunburst(train, path=['Sex', 'Pclass', 'Survived'], values='Survived')
fig.update_layout(title='Survival Hierarchy by Gender and Class')
fig.show()

## 22. Parallel Coordinates Plot using Pandas and Matplotlib
A parallel coordinates plot is used to visualize multivariate data by displaying multiple attributes on parallel axes. In this example, it compares attributes for a subset of passengers, grouped by survival status.

In [None]:
from pandas.plotting import parallel_coordinates

attributes = ['Age', 'Fare', 'SibSp', 'Parch']
subset_df = train.sample(100)  
plt.figure(figsize=(12, 6))
parallel_coordinates(subset_df[attributes + ['Survived']], 'Survived', colormap='coolwarm')
plt.title('Parallel Coordinates Plot of Passenger Attributes')
plt.xlabel('Attributes')
plt.ylabel('Values')
plt.legend(title='Survived', labels=['No', 'Yes'])
plt.show()

## 23. Pairwise Scatter Plot Matrix
A pairwise scatter plot matrix allows you to visualize relationships between pairs of numerical features.

In [None]:
numerical_features = train[['Age', 'Fare', 'SibSp', 'Parch']]
sns.pairplot(numerical_features, diag_kind='kde', markers='o')
plt.suptitle('Pairwise Scatter Plot Matrix for Numerical Features')
plt.show()

## 24. Treemap
A treemap is a hierarchical visualization that displays data as nested rectangles. This example shows the passenger count by passenger class and gender in a visually appealing way.

In [None]:
fig = px.treemap(train, path=['Pclass', 'Sex'], values='PassengerId',
                 title='Passenger Count by Class and Gender (Treemap)')
fig.show()

## 25. Donut Plot
A donut plot is a variation of a pie chart with a hole in the center. In this example, it shows the distribution of survivors by gender, making it easy to compare the proportions.

In [None]:
survivors_by_gender = train.groupby('Sex')['Survived'].sum()
labels = survivors_by_gender.index
sizes = survivors_by_gender.values

fig, ax = plt.subplots()
ax.pie(sizes, labels=labels, autopct='%1.1f%%', startangle=90, wedgeprops=dict(width=0.4))
circle = plt.Circle((0, 0), 0.3, color='white')
ax.add_artist(circle)
plt.title('Distribution of Survivors by Gender')
plt.axis('equal')
plt.show()

## 26. Lollipop Plot
A lollipop plot is a variation of a bar chart that uses line segments with circular markers to show data points. In this example, it visualizes the average fare for each passenger class.

In [None]:
avg_fare_by_class = train.groupby('Pclass')['Fare'].mean().reset_index()

plt.figure(figsize=(10, 6))
plt.stem(avg_fare_by_class['Pclass'], avg_fare_by_class['Fare'], basefmt=' ')
plt.title('Average Fare by Passenger Class (Lollipop Plot)')
plt.xlabel('Passenger Class')
plt.ylabel('Average Fare')
plt.xticks(avg_fare_by_class['Pclass'])
plt.show()

## 27. Step Plot
A step plot is similar to a line plot but emphasizes the discrete nature of the x-axis values. In this example, it visualizes the survival rate by age group.

In [None]:
age_bins = pd.cut(train['Age'], bins=range(0, 101, 10))
survival_by_age = train.groupby(age_bins)['Survived'].mean()

age_bins_str = [str(interval) for interval in survival_by_age.index]

plt.figure(figsize=(10, 6))
plt.step(age_bins_str, survival_by_age, where='mid')
plt.title('Survival Rate by Age Group (Step Plot)')
plt.xlabel('Age Group')
plt.ylabel('Survival Rate')
plt.xticks(rotation=45)
plt.show()

## 28. Scatter Plot Matrix
A scatter plot matrix displays pairwise scatter plots for selected numerical features. It helps visualize relationships between variables and distributions along the diagonal.

In [None]:
selected_features = ['Age', 'Fare', 'SibSp', 'Parch', 'Survived']
scatter_matrix = pd.plotting.scatter_matrix(train[selected_features], figsize=(12, 8), diagonal='hist', marker='o', alpha=0.5)
plt.suptitle('Scatter Plot Matrix for Selected Features')
plt.show()

## 29. Hexbin Plot
A hexbin plot is useful for visualizing the density of data points in a 2D space. In this example, it shows the distribution of passengers based on age and fare, with color indicating density.

In [None]:
plt.figure(figsize=(10, 6))
plt.hexbin(train['Age'], train['Fare'], gridsize=20, cmap='YlGnBu', bins='log')
plt.colorbar(label='Log Count')
plt.title('Hexbin Plot of Age vs. Fare')
plt.xlabel('Age')
plt.ylabel('Fare')
plt.show()

## 30. Polar Scatter Plot
A polar scatter plot is used to visualize data in polar coordinates. In this example, it shows the direction of travel based on the angle between age and fare.

In [None]:
theta = np.arctan2(train['Age'], train['Fare'])
r = np.sqrt(train['Age']**2 + train['Fare']**2)

plt.figure(figsize=(8, 8))
plt.polar(theta, r, 'bo', alpha=0.5)
plt.title('Polar Scatter Plot of Age vs. Fare')
plt.show()

## 31. Funnel Chart
A funnel chart is used to visualize a multi-step process or progression. In this example, it shows the stages of survival analysis based on various factors.

In [None]:
survival_stages = ['Embarked', 'Pclass', 'Sex', 'Age', 'Fare']
survival_counts = train[survival_stages].nunique().reset_index(name='Count')

fig = px.funnel(survival_counts, x='Count', y='index', title='Survival Stages (Funnel Chart)')
fig.show()

## 32. Venn Diagram
A Venn diagram is used to visualize the intersections of sets. In this example, it shows the overlap of survived males and females among passengers.

In [None]:
from matplotlib_venn import venn2

survived_male = set(train[(train['Survived'] == 1) & (train['Sex'] == 'male')]['PassengerId'])
survived_female = set(train[(train['Survived'] == 1) & (train['Sex'] == 'female')]['PassengerId'])

plt.figure(figsize=(8, 6))
venn2([survived_male, survived_female], ('Survived Males', 'Survived Females'))
plt.title('Venn Diagram of Survived Males and Females')
plt.show()


## 33. Sunburst Treemap
A sunburst treemap is a hierarchical chart that allows you to explore data with multiple levels of categories. In this example, it visualizes the hierarchical analysis of passenger class, gender, and survival.

In [None]:
fig = px.sunburst(train, path=['Pclass', 'Sex', 'Survived'], values='PassengerId',
                 title='Hierarchical Analysis of Class, Gender, and Survival (Sunburst Treemap)')
fig.show()

## 34. Violin Swarm Plot
This violin swarm plot combines the characteristics of violin plots and swarm plots. It visualizes the fare distribution by passenger class and gender, showing both the distribution and individual data points.

In [None]:
plt.figure(figsize=(12, 6))
sns.violinplot(data=train, x='Pclass', y='Fare', hue='Sex', split=True, inner='stick')
plt.title('Fare Distribution by Passenger Class and Gender')
plt.xlabel('Passenger Class')
plt.ylabel('Fare')
plt.legend(title='Gender')
plt.show()

*I will be adding other plots when I study them. So, stay tuned!!!*