<a href="https://colab.research.google.com/github/RohitMangale/ML/blob/main/pythonLibraries/Plotly.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Importing necessary libraries

In [None]:
import plotly.express as px
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns
import plotly.graph_objects as go
import plotly.figure_factory as ff

# 1.Line Chart

In [None]:
healthexp = sns.load_dataset('healthexp')
display(healthexp.head())

fig = px.line(healthexp,x ='Year',y = 'Life_Expectancy',color = "Country",
              # symbol="Country"
)
fig.show()

Unnamed: 0,Year,Country,Spending_USD,Life_Expectancy
0,1970,Germany,252.311,70.6
1,1970,France,192.143,72.2
2,1970,Great Britain,123.993,71.9
3,1970,Japan,150.437,72.0
4,1970,USA,326.961,70.9


## Interpretation

Overall Trend: Life expectancy has steadily increased in all six countries from 1970 to 2020. This indicates significant improvements in healthcare, living conditions, and public health.
Country Variations: While the general trend is upward, there are variations between countries. Japan consistently had the highest life expectancy, followed by Germany and France. Canada and the USA experienced a slight decline in the late 2000s before rebounding.
Convergence: Over time, the gap in life expectancy between countries has narrowed, suggesting a global trend towards improved health outcomes.

# 2.ScatterPlot

In [None]:
flights = sns.load_dataset("flights")
display(flights.head())

px.scatter(flights, x="year", y="passengers", color="month")

Unnamed: 0,year,month,passengers
0,1949,Jan,112
1,1949,Feb,118
2,1949,Mar,132
3,1949,Apr,129
4,1949,May,121


## Interpretation

1. **Seasonal Variation:** The number of passengers fluctuates significantly throughout the year, with peaks in summer months (June-August) and troughs in winter months (December-February). This suggests a strong seasonal influence on travel patterns.
2. **Overall Trend:** There's a clear upward trend in the total number of passengers over the decade from 1950 to 1960, indicating increasing popularity or demand for the service.
3. **Year-to-Year Fluctuations:** While the overall trend is upward, there are year-to-year variations in passenger numbers. Some years experience higher peaks and deeper troughs than others, suggesting factors beyond the general seasonal pattern.






# 3.Bar Charts

In [None]:
healthexp = sns.load_dataset('healthexp')
display(healthexp.head())

fig = px.bar(healthexp,x ='Year',y = 'Life_Expectancy',color = "Country",hover_data=['Life_Expectancy', 'Country'],
              # symbol="Country"
)
fig.show()

Unnamed: 0,Year,Country,Spending_USD,Life_Expectancy
0,1970,Germany,252.311,70.6
1,1970,France,192.143,72.2
2,1970,Great Britain,123.993,71.9
3,1970,Japan,150.437,72.0
4,1970,USA,326.961,70.9


## Interpretation


1. **Overall Trend:** The total life expectancy has steadily increased across all six countries from 1970 to 2020. This indicates significant improvements in healthcare, living conditions, and public health.
2. **Country Contributions:** The stacked bars show how each country has contributed to the overall increase. Some countries, like Germany and France, have consistently had higher life expectancies and have contributed more to the overall growth. Others, like Canada and the USA, have experienced fluctuations but still contributed to the upward trend.
3. **Convergence:** The stacked bars also reveal a trend towards convergence. While there were notable differences in life expectancy between countries in the earlier years, the gaps have narrowed over time, suggesting a more equitable distribution of health benefits.




# 4.Pie Chart

In [None]:
tips = sns.load_dataset("titanic")
display(tips.head())

fig= px.pie(tips,values='fare',names='class')
fig.update_layout(width=500, height=500)
fig.show()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.25,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.925,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1,S,First,woman,False,C,Southampton,yes,False
4,0,3,male,35.0,0,0,8.05,S,Third,man,True,,Southampton,no,True


## Interpretation

1. **Dominance of "First":** The largest segment of the pie chart, labeled "First," occupies 63.3% of the whole. This indicates that the category represented by "First" is the most significant or prevalent.
2. **Smaller Proportions:** The remaining two categories, "Third" and "Second," have smaller proportions. "Third" accounts for 23.4% and "Second" for 13.2%. This suggests that while these categories are present, they are less prominent compared to "First."
3. **Distribution:** The overall distribution shows a clear dominance of "First," with "Third" and "Second" being relatively smaller contributors. This information can be useful for understanding the relative importance or frequency of these categories within the context of the data being represented.


# 5.Bubble Chart

In [None]:
healthexp = sns.load_dataset("healthexp")
display(healthexp.head())


fig = px.scatter(healthexp.query('Year >= 2000'),y = 'Life_Expectancy',x = 'Spending_USD',size="Spending_USD", color="Country", hover_name="Country", log_x=True, size_max=20)
fig.update_layout(width=700, height=500)
fig.show()


Unnamed: 0,Year,Country,Spending_USD,Life_Expectancy
0,1970,Germany,252.311,70.6
1,1970,France,192.143,72.2
2,1970,Great Britain,123.993,71.9
3,1970,Japan,150.437,72.0
4,1970,USA,326.961,70.9


## Interpretation

1. **Positive Correlation:** There seems to be a general positive correlation between life expectancy and spending per capita (USD). As spending increases, life expectancy tends to rise, suggesting a potential relationship between economic factors and health outcomes.
2. **Country Clusters:** The data points are clustered based on country, indicating that there might be regional or cultural factors influencing the relationship between spending and life expectancy. For example, some countries might have higher spending but similar life expectancy due to different healthcare systems or socioeconomic factors.
3. **Outliers:** There are a few outliers, especially among the higher-spending countries, suggesting that other factors beyond spending might influence life expectancy. These outliers could be due to specific policies, cultural practices, or other variables not captured in this dataset.


# 6.Sunburst Charts

In [None]:


diamonds = sns.load_dataset('diamonds')
display(diamonds.head())

fig = px.sunburst(diamonds, path=['clarity','color','cut'],values='price')

fig.update_layout(width=700, height=700)

fig.show()

Unnamed: 0,carat,cut,color,clarity,depth,table,price,x,y,z
0,0.23,Ideal,E,SI2,61.5,55.0,326,3.95,3.98,2.43
1,0.21,Premium,E,SI1,59.8,61.0,326,3.89,3.84,2.31
2,0.23,Good,E,VS1,56.9,65.0,327,4.05,4.07,2.31
3,0.29,Premium,I,VS2,62.4,58.0,334,4.2,4.23,2.63
4,0.31,Good,J,SI2,63.3,58.0,335,4.34,4.35,2.75










## Interpretation

1. **Dominant Category:** The "Ideal" category with a cut grade of "G" and color "D" has the largest share of the pie chart, indicating it's the most common or valuable diamond type in the dataset.
2. **Hierarchical Structure:** The chart's hierarchical structure reveals the relationships between different diamond attributes. The parent categories (e.g., "VS2," "SI1") are further divided into subcategories (e.g., "G," "H," "D"), showing how these attributes are interconnected.
3. **Color Distribution:** The colors in the chart represent different diamond colors. The prevalence of certain colors might indicate trends or preferences in the market.
4. **Value Distribution:** While the chart doesn't explicitly show price, the relative sizes of the segments can suggest the value or desirability of different diamond combinations. Larger segments might represent more valuable or sought-after diamonds.


# 7.Continuous Error Bands


In [None]:
import plotly.graph_objs as go
import pandas as pd

df = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/wind_speed_laurel_nebraska.csv')

fig = go.Figure([
    go.Scatter(
        name='Measurement',
        x=df['Time'],
        y=df['10 Min Sampled Avg'],
        mode='lines',
        line=dict(color='rgb(31, 119, 180)'),
    ),
    go.Scatter(
        name='Upper Bound',
        x=df['Time'],
        y=df['10 Min Sampled Avg']+df['10 Min Std Dev'],
        mode='lines',
        marker=dict(color="#444"),
        line=dict(width=0),
        showlegend=False
    ),
    go.Scatter(
        name='Lower Bound',
        x=df['Time'],
        y=df['10 Min Sampled Avg']-df['10 Min Std Dev'],
        marker=dict(color="#444"),
        line=dict(width=0),
        mode='lines',
        fillcolor='rgba(68, 68, 68, 0.3)',
        fill='tonexty',
        showlegend=False
    )
])
fig.update_layout(
    yaxis_title='Wind speed (m/s)',
    title='Continuous, variable value error bars',
    hovermode="x"
)
fig.update_layout(width=800, height=500)
fig.show()

## Interpretation

1. **Trend:** The wind speed fluctuates over the two-day period. There are periods of increased wind speed followed by periods of decreased wind speed.
2. **Error Bands:** The shaded area represents the error or uncertainty associated with the wind speed measurements. The wider the error band, the greater the uncertainty.
3. **Overall Variability:** The wind speed exhibits significant variability over the time period, with both high and low wind speeds observed. This suggests a dynamic weather pattern.


# 8.Box Plots

In [None]:
taxis = sns.load_dataset("taxis")
display(taxis.head())

fig = px.box(taxis,x = 'payment',y = 'passengers',color='color')
fig.update_layout(width=700, height=500)
fig.show()

Unnamed: 0,pickup,dropoff,passengers,distance,fare,tip,tolls,total,color,payment,pickup_zone,dropoff_zone,pickup_borough,dropoff_borough
0,2019-03-23 20:21:09,2019-03-23 20:27:24,1,1.6,7.0,2.15,0.0,12.95,yellow,credit card,Lenox Hill West,UN/Turtle Bay South,Manhattan,Manhattan
1,2019-03-04 16:11:55,2019-03-04 16:19:00,1,0.79,5.0,0.0,0.0,9.3,yellow,cash,Upper West Side South,Upper West Side South,Manhattan,Manhattan
2,2019-03-27 17:53:01,2019-03-27 18:00:25,1,1.37,7.5,2.36,0.0,14.16,yellow,credit card,Alphabet City,West Village,Manhattan,Manhattan
3,2019-03-10 01:23:59,2019-03-10 01:49:51,1,7.7,27.0,6.15,0.0,36.95,yellow,credit card,Hudson Sq,Yorkville West,Manhattan,Manhattan
4,2019-03-30 13:27:42,2019-03-30 13:37:14,3,2.16,9.0,1.1,0.0,13.4,yellow,credit card,Midtown East,Yorkville West,Manhattan,Manhattan


## Interpretation

1. **Payment Method Comparison:** The box plot compares the distribution of passengers for two payment methods: credit card and cash.
2. **Median Passengers:** The median number of passengers for both credit card and cash payments is around 2. This suggests that the typical number of passengers per transaction is similar for both methods.
3. **Distribution:** The box plots show the spread of passenger numbers for each payment method. Credit card transactions have a slightly wider range (indicated by the longer whiskers), suggesting more variability in the number of passengers per transaction.
4. **Outliers:** There are a few outliers, especially for cash payments, indicating transactions with significantly fewer or more passengers than the typical range.
5. **Color Differences:** The colors represent different categories within each payment method. While the box plots for each payment method are similar in shape, the color differences might indicate variations within each group. Without more context, it's difficult to determine the meaning of these colors.


# 9.Error Bars

In [None]:
import seaborn as sns
import plotly.graph_objects as go

# Load the dataset
car_crashes = sns.load_dataset('car_crashes')

means = car_crashes['total'].mean()
stds = car_crashes['total'].std()

error_y_array = [stds for _ in car_crashes['total']]

fig = go.Figure()
fig.add_trace(go.Scatter(
    x=car_crashes['total'],
    y=car_crashes['speeding'],
    mode='markers+lines',
    error_y=dict(
        type='data',
        array=error_y_array,
        color='black'
    )
))

fig.update_layout(
    title='Car Crashes: Total vs. Speeding',
    xaxis_title='Total Crashes',
    yaxis_title='Speeding Crashes'
)
fig.update_layout(width=700, height=500)
fig.show()

## Interpretation

1. **Scatter Plot with Error Bars:** The plot displays a scatter plot of total crashes versus speeding crashes, with error bars indicating uncertainty in the data.
2. **Positive Correlation:** There seems to be a general positive correlation between total crashes and speeding crashes, suggesting that as the number of total crashes increases, the number of speeding crashes tends to increase as well.
3. **Scatter:** The data points are scattered, indicating some variability in the relationship. While there's a general trend, there are also instances where the number of speeding crashes doesn't increase proportionally with total crashes.
4. **Error Bars:** The error bars show the uncertainty in the data points. Larger error bars indicate greater uncertainty in the measured values.
5. **Outliers:** There are a few outliers, which are data points that deviate significantly from the general trend. These outliers might represent unusual cases or anomalies that could be further investigated.



# 10.Violin Plot

In [None]:

healthexp = sns.load_dataset("healthexp")
display(healthexp.head())
fig = px.violin(healthexp, y="Spending_USD", x="Country", box=True, points="all",
          hover_data=healthexp.columns)
fig.update_layout(width=800, height=500)
fig.show()

Unnamed: 0,Year,Country,Spending_USD,Life_Expectancy
0,1970,Germany,252.311,70.6
1,1970,France,192.143,72.2
2,1970,Great Britain,123.993,71.9
3,1970,Japan,150.437,72.0
4,1970,USA,326.961,70.9


## Interpretation

1. **Distribution Comparison:** The violin plot compares the distribution of spending per capita (USD) across six countries. The shape of each violin reveals the density of data points within specific spending ranges.
2. **Median Spending:** The horizontal line within each violin represents the median spending. Germany and the USA have higher median spending compared to the other countries.
3. **Variation:** The width of each violin indicates the spread of spending within each country. Canada and the USA show the highest variation in spending, while France and Great Britain have relatively narrower distributions.


# 11.2D Histograms or Density Heatmaps

In [None]:
car_crashes = sns.load_dataset('car_crashes')
fig = px.density_heatmap(car_crashes, x="total", y="alcohol")
fig.update_layout(width=800, height=500)
fig.show()

## Interpretation

1. **Distribution:** The heatmap shows the distribution of counts across different combinations of "total" and "alcohol" values. The color intensity represents the frequency of occurrences within each cell.
2. **Relationship:** There seems to be a positive relationship between "total" and "alcohol." As the values of "total" increase, the counts tend to concentrate in the higher "alcohol" ranges, indicating a correlation between the two variables.
3. **Hotspots:** The yellow and orange regions in the upper right corner represent areas with the highest counts, suggesting that combinations of higher "total" and "alcohol" values are more frequent.


# 12.3D Surface Plots

In [None]:
import plotly.graph_objects as go
import pandas as pd
z_data = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/api_docs/mt_bruno_elevation.csv')
fig = go.Figure(data=[go.Surface(z=z_data.values)])
fig.update_layout(title='Mt Bruno Elevation', autosize=False,
                  width=500, height=500,
                  margin=dict(l=65, r=50, b=65, t=90))
fig.show()

## Interpretation


1. **Topographic Map:** The 3D surface plot represents the elevation of Mt. Bruno. The height (z-axis) represents the elevation, while the x and y axes correspond to geographical coordinates.
2. **Peak and Valleys:** The plot shows the mountain's peaks and valleys. The highest point (peak) is represented by the yellow region, while the lower areas are shown in purple and blue.
3. **Slope and Terrain:** The steepness of the slopes can be inferred from the color gradient. Steeper slopes are represented by more rapid changes in color, while gentler slopes have more gradual color transitions. This visualization helps understand the terrain and its features.



# 13.Candlestick Charts

In [None]:
import plotly.graph_objects as go

import pandas as pd
from datetime import datetime

df = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/finance-charts-apple.csv')

fig = go.Figure(data=[go.Candlestick(x=df['Date'],
                open=df['AAPL.Open'],
                high=df['AAPL.High'],
                low=df['AAPL.Low'],
                close=df['AAPL.Close'])])
fig.update_layout(width=800, height=500)

fig.show()

## Interpretation

1. **Price Fluctuations:** The chart shows the price fluctuations of a financial asset (likely the Dow Jones Industrial Average) over a period from approximately April 2015 to January 2017.
2. **Candlestick Pattern:** The chart uses candlestick patterns, where each candle represents a specific time period (likely a day). The color of the candle (green or red) indicates whether the price closed higher or lower than the opening price. The lines extending from the candles represent the high and low prices during that period.
3. **Trend Analysis:** By observing the overall trend of the candlesticks, you can identify periods of bullish (upward) and bearish (downward) markets. In this chart, there appears to be a general upward trend, with some periods of volatility or sideways movement.



# 14.Gantt Chart

In [None]:
import plotly.express as px
import pandas as pd

df = pd.DataFrame([
    dict(Task="Job A", Start='2009-01-01', Finish='2009-02-28'),
    dict(Task="Job B", Start='2009-03-05', Finish='2009-04-15'),
    dict(Task="Job C", Start='2009-02-20', Finish='2009-05-30')
])

fig = px.timeline(df, x_start="Start", x_end="Finish", y="Task")
fig.update_yaxes(autorange="reversed") # otherwise tasks are listed from the bottom up
fig.update_layout(width=500, height=400)
fig.show()

## Interpretation

1. **Project Timeline:** The Gantt chart visually represents the timeline of three tasks: Job A, Job B, and Job C. The horizontal axis represents time, spanning from January 2009 to June 2009.
2. **Task Duration:** The length of each bar indicates the duration of the corresponding task. Job A appears to be completed relatively quickly, while Job C has a longer duration. Job B falls somewhere in between.
3. **Task Sequence:** The placement of the bars suggests the potential sequence of the tasks. Job A seems to have been completed first, followed by Job B and then Job C. However, this is not explicitly stated in the chart, so there might be other dependencies or constraints not visible here.


# 15.Multi-Distplot

In [None]:
import plotly.figure_factory as ff
import numpy as np
x1 = np.random.randn(200) - 2
x2 = np.random.randn(200)
x3 = np.random.randn(200) + 2
x4 = np.random.randn(200) + 4
hist_data = [x1, x2, x3, x4]
group_labels = ['Group 1', 'Group 2', 'Group 3', 'Group 4']
fig = ff.create_distplot(hist_data, group_labels, bin_size=.2)
fig.update_layout(width=700, height=400)
fig.show()

## Interpretation:

1. **Distribution Comparison:** The histogram compares the distributions of four groups (Group 1, Group 2, Group 3, and Group 4). Each group's distribution is represented by a separate curve.
2. **Central Tendency and Spread:** Group 1 and Group 2 have similar central tendencies, with Group 1 having a slightly wider spread. Group 3 and Group 4 have higher central tendencies, with Group 4 having a narrower spread.
3. **Overlapping Distributions:** The distributions overlap to some extent, indicating that there is some variation within each group. However, Group 3 and Group 4 show distinct separation from Group 1 and Group 2, suggesting that these groups might have different characteristics.


# 16.Funnel Chart

In [None]:
from plotly import graph_objects as go
colors = ["gold", "gold", "lightgreen", "lavender"]
fig = go.Figure(
    go.Funnelarea(
        labels=["Interview 1", "Interview 2", "Test", "Final Stage"],
        values=[100, 70, 40, 20],
        textfont_size=20,
        marker=dict(colors=colors, pattern=dict(shape=["", "/", "", ""])),
    )
)
fig.update_layout(width=500, height=400)
fig.show()

## Interpretation:

1. **Conversion Rates:** The funnel chart illustrates the percentage of candidates who progress through various stages of a recruitment process. It shows a significant drop in the number of candidates from Interview 1 to Interview 2, indicating a bottleneck in the selection process.
2. **Stage-wise Attrition:** The chart also highlights the attrition rates at each stage. For instance, 30.4% of candidates who pass Interview 1 are eliminated after Interview 2.
3. **Overall Success Rate:** The final stage, "Final Stage," represents the overall success rate of the recruitment process. In this case, only 8.7% of the initial candidates make it to the final stage. This suggests that the process is highly selective, with a large number of candidates being rejected at earlier stages.


# 17.Trendline Plot

In [None]:
import plotly.express as px
df = px.data.tips()
fig = px.scatter(df, x="total_bill", y="tip", symbol="smoker", color="sex", trendline="ols", trendline_scope="overall")
fig.update_layout(width=500, height=400)
fig.show()

## Interpretation:

1. **Overall Trend:** There's a positive correlation between total bill and tip, meaning as the bill amount increases, the tip amount also tends to increase. This is evident from the overall trendline.
2. **Gender Differences:** There seems to be a slight difference in tipping behavior between genders. Female smokers tend to tip slightly higher than male smokers for similar total bill amounts.
3. **Effect of Smoking:** While the sample size may be limited, the data suggests that smokers, both male and female, might tip slightly higher than non-smokers for a given total bill. However, this trend is not as pronounced as the gender difference.


# 18.Ternary Charts

In [None]:
import plotly.express as px
df = px.data.election()
fig = px.scatter_ternary(df, a="Joly", b="Coderre", c="Bergeron", hover_name="district",
    color="winner", size="total", size_max=15,
    color_discrete_map = {"Joly": "blue", "Bergeron": "green", "Coderre":"red"} )
fig.update_layout(width=500, height=400)
fig.show()

## Interpretation

**Election Outcome**: The plot shows election results, with different districts represented by points based on vote proportions for three candidates: Joly (blue), Coderre (red), and Bergeron (green).

**Clustering of Votes**: Coderre (red) has a stronger presence in districts with a higher concentration of red points, while Joly (blue) and Bergeron (green) show competitive clustering around the middle, indicating closer results between them in certain districts.

**Vote Proportion Balance**: Districts closer to the corners reflect higher votes for a specific candidate. The plot indicates a relatively balanced distribution of votes between the three candidates, with some districts showing more dominance for specific candidates.

# 19.Patterned Charts


In [None]:
import plotly.express as px
df = px.data.medals_long()

fig = px.area(df, x="medal", y="count", color="nation", pattern_shape="nation")
fig.update_layout(width=500, height=400)
fig.show()

## Interpretation
**Medal Distribution by Country**: Canada (green), China (red), and South Korea (blue) are represented with their medal counts across gold, silver, and bronze categories. Canada has the highest overall medal count.

**Gold Medal Dominance**: All three nations have more gold medals compared to silver and bronze, with a noticeable decline in medal counts as we move towards bronze.

**Proportional Representation**: South Korea’s medal count is consistently lower across all categories compared to China and Canada, with China having a relatively balanced share between the silver and bronze medals.

# 20.Dot Plots

In [None]:
import plotly.express as px
df = px.data.medals_long()

fig = px.scatter(df, y="nation", x="count", color="medal", symbol="medal")
fig.update_traces(marker_size=10)
fig.update_layout(width=500, height=400)
fig.show()

## Interpretation
**Gold Dominance:** Canada has the highest count for gold medals (circle symbol) with around 25 medals, while China and South Korea have fewer gold medals compared to Canada.

**Balanced Medals in China:** China’s medal distribution is relatively balanced across all medal types, with similar counts for gold, silver, and bronze medals, all around the 10-15 range.