<a href="https://colab.research.google.com/github/Arturo9314/Data_Analysis/blob/main/02-TC/TopCompanies.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Top 2000 Companies Globally analysis.

Analyzing data from the global ranking of the top 2000 companies based on revenues, profits, assets, and market value offers valuable insights into both the health of the global economy and market dynamics. This information not only aids in evaluating leading industries and countries but also guides businesses in making informed decisions regarding expansion, partnerships, and strategies.

I begin by importing the required Python libraries and dataset containing information the Global ranking of the top 2000 largest companies in the world based on revenue, profits, assets, and market value, as of 2020. Also includes country, continent, and latitude, longitude coordinates.

[Dataset](https://www.kaggle.com/datasets/joebeachcapital/top-2000-companies-globally)



In [1]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [2]:
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.io as pio
import plotly.graph_objects as go
pio.templates.default = "plotly_white"
data = pd.read_csv("/content/drive/MyDrive/Data Analysis/Top Companies Globally/Top2000CompaniesGlobally.csv", encoding="latin-1")
data.head()

Unnamed: 0,Global Rank,Company,Sales ($billion),Profits ($billion),Assets ($billion),Market Value ($billion),Country,Continent,Latitude,Longitude
0,1,ICBC,134.8,37.8,2813.5,237.3,China,Asia,35.86166,104.195397
1,2,China Construction Bank,113.1,30.6,2241.0,202.0,China,Asia,35.86166,104.195397
2,3,JPMorgan Chase,108.2,21.3,2359.1,191.4,USA,North America,37.09024,-95.712891
3,4,General Electric,147.4,13.6,685.3,243.7,USA,North America,37.09024,-95.712891
4,5,Exxon Mobil,420.7,44.9,333.8,400.4,USA,North America,37.09024,-95.712891


### The data at a glance:

In [3]:
data.shape

(1924, 10)

In [4]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1924 entries, 0 to 1923
Data columns (total 10 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   Global Rank              1924 non-null   int64  
 1   Company                  1924 non-null   object 
 2   Sales ($billion)         1924 non-null   float64
 3   Profits ($billion)       1924 non-null   float64
 4   Assets ($billion)        1924 non-null   float64
 5   Market Value ($billion)  1924 non-null   float64
 6   Country                  1924 non-null   object 
 7   Continent                1924 non-null   object 
 8   Latitude                 1924 non-null   float64
 9   Longitude                1924 non-null   float64
dtypes: float64(6), int64(1), object(3)
memory usage: 150.4+ KB


In [5]:
data.describe()

Unnamed: 0,Global Rank,Sales ($billion),Profits ($billion),Assets ($billion),Market Value ($billion),Latitude,Longitude
count,1924.0,1924.0,1924.0,1924.0,1924.0,1924.0,1924.0
mean,997.232848,19.265904,1.22604,79.507796,19.55816,34.618747,15.455664
std,575.502781,34.683911,3.413831,261.098775,32.957023,18.259499,92.639655
min,1.0,0.0,-24.5,1.0,0.0,-40.900557,-106.346771
25%,500.75,4.1,0.3,9.675,5.3,35.86166,-95.712891
50%,997.5,9.0,0.6,19.25,9.6,37.09024,10.451526
75%,1494.25,18.425,1.2,45.8,19.2,40.463667,105.318756
max,1999.0,469.2,44.9,3226.2,416.6,61.92411,174.885971


### Numerical analysis and visualization

In [6]:
data['Market Value ($billion)'].describe()

count    1924.000000
mean       19.558160
std        32.957023
min         0.000000
25%         5.300000
50%         9.600000
75%        19.200000
max       416.600000
Name: Market Value ($billion), dtype: float64

In [7]:
data['Market Value ($billion)'].mean()

19.558160083160082

In [8]:
data['Market Value ($billion)'].median()

9.6

In [9]:
data[data['Market Value ($billion)']==data['Market Value ($billion)'].max()]['Company']

15    Apple
Name: Company, dtype: object

### Categorical analysis and visualization

Let's choose just two columns of data, which will help us categorize the data for countries.

In [10]:
data_country_marketvalue= data.loc[:,['Country','Market Value ($billion)']]
print(data_country_marketvalue)

        Country  Market Value ($billion)
0         China                    237.3
1         China                    202.0
2           USA                    191.4
3           USA                    243.7
4           USA                    400.4
...         ...                      ...
1919        USA                      7.1
1920      Japan                      0.5
1921  Singapore                      4.2
1922   Colombia                      5.8
1923        USA                      3.3

[1924 rows x 2 columns]


In [11]:
data_country_marketvalue.dtypes

Country                     object
Market Value ($billion)    float64
dtype: object

In [12]:
df_country_marketvalue = data_country_marketvalue.groupby("Country")["Market Value ($billion)"].sum()
print(df_country_marketvalue)

Country
Australia                1108.5
Belgium                   232.5
Bermuda                    74.4
Brazil                    654.4
Canada                   1221.7
Channel Islands             7.7
Chile                     103.2
China                    2762.8
Colombia                  164.6
Czech Republic             15.7
Denmark                   199.4
Egypt                       9.8
Finland                   118.4
France                   1375.8
Germany                  1263.3
Greece                     22.4
Hong Kong                 939.3
Hungary                    13.1
India                     704.7
Indonesia                 144.3
Ireland                   293.1
Israel                     70.4
Italy                     334.8
Japan                    2582.6
Jordan                      5.9
Kazakhstan                  9.9
Kuwait                     35.5
Lebanon                     4.1
Liberia                     7.4
Luxembourg                 85.0
Malaysia                  223.7


Now let’s plot this data to have a look at which country has the highest market value:

In [13]:
pastel_colors = px.colors.qualitative.Pastel
px.bar(df_country_marketvalue, color=df_country_marketvalue.index, y='Market Value ($billion)', color_discrete_sequence=pastel_colors,
                          title='Market capitalization companies by country')

Now let’s analyze the distribution of countries across different continent within each market value:

In [14]:
data_continent_country_marketvalue = data.loc[:,['Continent','Country','Market Value ($billion)']]
fig = px.treemap(data_continent_country_marketvalue, path=["Continent", "Country"], color='Continent', color_discrete_sequence=pastel_colors, values='Market Value ($billion)', labels=True, title='Market capitalization companies by continent')
# Show the figure
fig.show()

Now let’s have a look at the more profitable companies in North America:

In [15]:
data_north_america = data.loc[(data['Continent']=='North America') & (data['Profits ($billion)'] > 0), ['Company', 'Profits ($billion)']]
data_north_america = data_north_america.sort_values('Profits ($billion)').reset_index(drop=True)
print(data_north_america)

             Company  Profits ($billion)
0    Cabot Oil & Gas                 0.1
1    Harbinger Group                 0.1
2     Morgan Stanley                 0.1
3           Facebook                 0.1
4         Air Canada                 0.1
..               ...                 ...
568      Wells Fargo                18.9
569   JPMorgan Chase                21.3
570          Chevron                26.2
571            Apple                41.7
572      Exxon Mobil                44.9

[573 rows x 2 columns]


In [16]:
px.bar(data_north_america, color='Profits ($billion)', x='Company', y='Profits ($billion)', color_discrete_sequence=pastel_colors,
                          title='Profitable Companies')

Now let’s have a look at the comparison of sales, profits, assets and market value among the top 10 companies with the highest global net profits:

In [17]:
data_profitable_companies_global = data.loc[ data['Profits ($billion)'] > 0, ['Company', 'Sales ($billion)', 'Profits ($billion)', 'Assets ($billion)', 'Market Value ($billion)']]
data_profitable_companies_global = data_profitable_companies_global.sort_values('Profits ($billion)', ascending=False).reset_index(drop=True).head(10)
print(data_profitable_companies_global)

                      Company  Sales ($billion)  Profits ($billion)  \
0                 Exxon Mobil             420.7                44.9   
1                       Apple             164.7                41.7   
2                     Gazprom             144.0                40.6   
3                        ICBC             134.8                37.8   
4     China Construction Bank             113.1                30.6   
5            Volkswagen Group             254.0                28.6   
6           Royal Dutch Shell             467.2                26.6   
7                     Chevron             222.6                26.2   
8  Agricultural Bank of China             103.0                23.0   
9               Bank of China              98.1                22.1   

   Assets ($billion)  Market Value ($billion)  
0              333.8                    400.4  
1              196.1                    416.6  
2              339.3                    111.4  
3             2813.5      

In [18]:
fig_profitable_companies_global = go.Figure()
fig_profitable_companies_global.add_trace(go.Bar( x = data_profitable_companies_global['Company'], y=data_profitable_companies_global['Sales ($billion)'], name='Sales ($billion)', marker_color='rgb(249, 177, 239)'))
fig_profitable_companies_global.add_trace(go.Bar( x = data_profitable_companies_global['Company'], y=data_profitable_companies_global['Profits ($billion)'], name='Profits ($billion)', marker_color='rgb(214, 177, 249)'))
fig_profitable_companies_global.add_trace(go.Bar( x = data_profitable_companies_global['Company'], y=data_profitable_companies_global['Assets ($billion)'], name='Assets ($billion)', marker_color='rgb(128, 161, 249)'))
fig_profitable_companies_global.add_trace(go.Bar( x = data_profitable_companies_global['Company'], y=data_profitable_companies_global['Market Value ($billion)'], name='Market Value ($billion)', marker_color='rgb(128, 236, 249)'))
fig_profitable_companies_global.update_layout(title='Comparison of sales, profits, assets, and market value among the top 10 companies with the highest global net profits.', xaxis_title='Company', yaxis_title='Billions of dollars', barmode='group', showlegend=True)
fig_profitable_companies_global.show()

It seems that the number of assets has no correlation. Now let’s analyze the correlation of sales, profits and market value within the North america segment.

In [19]:
data_sales_profits = data.loc[data['Continent']=='North America', ['Company','Sales ($billion)', 'Profits ($billion)', 'Market Value ($billion)']].reset_index(drop=True)
print(data_sales_profits)

                        Company  Sales ($billion)  Profits ($billion)  \
0                JPMorgan Chase             108.2                21.3   
1              General Electric             147.4                13.6   
2                   Exxon Mobil             420.7                44.9   
3            Berkshire Hathaway             162.5                14.8   
4                   Wells Fargo              91.2                18.9   
..                          ...               ...                 ...   
624                 Alexander's               0.2                 0.7   
625      Two Harbors Investment               0.5                 0.3   
626                  Health Net              11.3                 0.1   
627              Tractor Supply               4.7                 0.3   
628  Old Republic International               5.0                -0.1   

     Market Value ($billion)  
0                      191.4  
1                      243.7  
2                      400.4  

In [20]:
correlation_matrix = data_sales_profits[['Sales ($billion)', 'Profits ($billion)', 'Market Value ($billion)']].corr()
fig_heatmap = go.Figure(data=go.Heatmap(
                    z=correlation_matrix.values,
                    x=correlation_matrix.columns,
                    y=correlation_matrix.columns,
                    colorscale='RdBu',
                   colorbar=dict(title='Correlation'
)))
fig_heatmap.update_layout(title='Correlation Matrix of sales, profits and market value within Nort America segment')

fig_heatmap.show()

## Sumary

In this study, I gathered data from the global ranking of the top 2000 companies, based on sales, profits, assets, and market value, starting from 2020. The analysis revealed noteworthy insights. Apple emerged as the market leader with the highest market value, while the United States stood out as the country housing the most high-value companies. Furthermore, the North American region demonstrated a dominant presence in the market value of companies. This prompted a focused analysis of the North American region, which highlighted companies with profits in the region, showcasing Exxon Mobil as the top profit-generating entity. A subsequent histogram comparison was conducted, contrasting the sales, profits, assets, and market value of the top 10 globally profitable companies. Interestingly, it was found that the number of assets exhibited minimal correlation with the other factors. Lastly, an exploration of correlations within the North American region indicated a significant positive correlation between market value and profits, indicating the region's attractiveness for investment. Overall, this comprehensive analysis provides valuable insights into global corporate trends and investment opportunities.