Praktice with sunburst charts and treemap charts on the tech layoffs dataset.

Let's explore!

In [1]:
# Loading dataframe
import pandas as pd

df = pd.read_csv('/kaggle/input/tech-layoffs-2020-2024/tech_layoffs_Q2_2024.csv')

# Display the first few rows of the DataFrame
display(df.head())

# Display the DataFrame information
display(df.info(verbose=True))

Unnamed: 0,#,Company,Location_HQ,Region,State,Country,Continent,Laid_Off,Date_layoffs,Percentage,Company_Size_before_Layoffs,Company_Size_after_layoffs,Industry,Stage,Money_Raised_in__mil,Year,latitude,longitude
0,1,Tamara Mellon,Los Angeles,,California,USA,North America,20.0,2020-03-12,400,50,30,Retail,Series C,90.0,2020,34.053691,-118.242766
1,2,HopSkipDrive,Los Angeles,,California,USA,North America,8.0,2020-03-13,100,80,72,Transportation,Unknown,45.0,2020,34.053691,-118.242766
2,3,Panda Squad,San Francisco,San Francisco Bay Area,California,USA,North America,6.0,2020-03-13,750,8,2,Consumer,Seed,1.0,2020,37.779259,-122.419329
3,4,Help.com,Austin,,,USA,North America,16.0,2020-03-16,1000,16,0,Support,Seed,6.0,2020,30.271129,-97.7437
4,5,Inspirato,Denver,,,USA,North America,130.0,2020-03-16,220,591,461,Travel,Series C,79.0,2020,39.739236,-104.984862


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1839 entries, 0 to 1838
Data columns (total 18 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   #                            1839 non-null   int64  
 1   Company                      1839 non-null   object 
 2   Location_HQ                  1839 non-null   object 
 3   Region                       473 non-null    object 
 4   State                        566 non-null    object 
 5   Country                      1839 non-null   object 
 6   Continent                    1839 non-null   object 
 7   Laid_Off                     1677 non-null   float64
 8   Date_layoffs                 1839 non-null   object 
 9   Percentage                   1667 non-null   object 
 10  Company_Size_before_Layoffs  1585 non-null   object 
 11  Company_Size_after_layoffs   1619 non-null   object 
 12  Industry                     1839 non-null   object 
 13  Stage             

None

Let`s start with the sunburst and treemap charts

The following code creates a DataFrame that shows how many times each combination of Country and Location_HQ appears in the original DataFrame df. The result will include three columns: Country, Location_HQ, and size (the count of occurrences for each combination).

In [2]:
gr_cat = df[["Country",
             "Location_HQ"]].groupby(["Country",
                                       "Location_HQ"], as_index=False).size()

**Sunburst Chart**

This code imports the plotly.express library and uses it to create a sunburst chart from the DataFrame gr_cat. The chart is configured to show the hierarchy of tech layoffs by country and location headquarters, with the size of each segment representing the number of layoffs. Each segment is colored according to its country. The chart is set to a width of 1280 pixels and a height of 800 pixels, and it has a custom title: "Locations of tech layoffs by location HQ and countries (cities in total number)". The layout of the chart is further customized by adjusting the font size and margins. The trace information is updated to display both the label and the percentage of the parent segment. Finally, the chart is displayed on the screen using fig.show().

In [3]:
import plotly.express as px

fig = px.sunburst(gr_cat, width=1280, height=800,
                  path=["Country","Location_HQ"], values="size",
                  color="Country",
                  title="<span style='font-size:18px;'><b>Locations of tech layoffs by location HQ and countries (cities in total number)</b></span><b></b>"
                  )
fig.update_layout(font_size=10, margin=dict(l=10, r=10, t=30, b=50))
fig.update_traces(textinfo="label+percent parent")
fig.show()

65% of the entries in the dataset were reports of layoffs in the USA. And of all the USA reports in the dataset are 26% percents of layoffs in San Francisco and 17 % in New York City. 

**Treemap charts**

This code imports the pandas and plotly.express libraries and shows the same data as above and the second treemap filters the DataFrame gr_cat to include only rows where the country is the USA, storing the result in usa_df. It then creates a treemap from this filtered DataFrame, setting the width to 1280 pixels and the height to 800 pixels. The treemap displays the hierarchy based on Country and Location_HQ, with each segment sized according to the size column and colored by Location_HQ. The trace information is updated to show both the label and the percentage of the parent segment. Finally, the treemap is displayed using fig.show().



In [4]:
fig = px.treemap(gr_cat, width=1280, height=800,
                 path=["Country","Location_HQ"], values='size',
                 color='Country')
fig.update_traces(textinfo="label+percent parent")
fig.show()

What chart do you like more sunburst or treemap for this data?

Let's do a treemap charts with only locations in the USA.

In [5]:
import pandas as pd
import plotly.express as px

# Filter the DataFrame for rows where Country is USA
usa_df = gr_cat[gr_cat['Country'] == 'USA']

# Create the treemap using the filtered DataFrame
fig = px.treemap(usa_df, width=1280, height=800,
                 path=["Country", "Location_HQ"], values='size',
                 color='Location_HQ')
fig.update_traces(textinfo="label+percent parent")
fig.show()

Let's use more filter functions for the treemap chart.

Filter is now the column state with the entry California.

In [6]:
st_cat = df[["State",
             "Location_HQ"]].groupby(["State",
                                       "Location_HQ"], as_index=False).size()

import pandas as pd
import plotly.express as px


Ca_df = st_cat[st_cat['State'] == 'California']

# Create the treemap using the filtered DataFrame
fig = px.treemap(Ca_df, width=1280, height=800,
                 path=["State", "Location_HQ"], values='size',
                 color='Location_HQ')
fig.update_traces(textinfo="label+percent parent")
fig.show()

And finally the filter for the region = San Francisco Bay Area.

In [7]:
re_cat = df[["Region",
             "Location_HQ"]].groupby(["Region",
                                       "Location_HQ"], as_index=False).size()

import pandas as pd
import plotly.express as px


SF_df = re_cat[re_cat['Region'] == 'San Francisco Bay Area']

# Create the treemap using the filtered DataFrame
fig = px.treemap(SF_df, width=1280, height=800,
                 path=["Region", "Location_HQ"], values='size',
                 color='Location_HQ')
fig.update_traces(textinfo="label+percent parent")
fig.show()