<a href="https://colab.research.google.com/github/07Shibin/07Shibin/blob/main/FortuneThousandAnalysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Background
<p>A management consulting firm, **StratEdge Analytics**, works with investors, market researchers, and corporate strategists to identify growth opportunities and competitive insights across industries. To support data-driven decision-making, the firm uses the Fortune 1000 dataset, which contains detailed information on the top U.S. companies ranked by revenue.</p>

# Objective

The goal is to conduct exploratory data analysis (EDA) and develop insightful visualizations to answer strategic questions such as:
<ul>
<li>

What industries dominate the top revenue brackets? </li>

<li>How is company performance distributed across locations?</li>

<li>What is the relationship between revenue, profit, and employee count?</li>

<li>Which states or cities are corporate hotspots?</li>

<li>Are there observable patterns in company sizes by industry?</li>

This analysis supports:

<li>Private equity firms scouting for acquisition targets.</li>

<li>Executives benchmarking their company's performance.</li>

<li>Policy makers understanding geographic business density.</li>

<li>Analysts and researchers identifying economic trends.</li>

</ul>




# Key Analytic Questions

<ul>
<li>Which industries or companies lead in revenue?</li>
<li>Are higher revenues always associated with higher profits?</li>
<li>Which firms employ the most people? Any correlation to revenue?</li>
<li>Which states/cities have the highest concentration of Fortune 1000 companies?</li>
<li>What are the top-performing industries by average revenue and profit margin?</li>
</ul>

In [None]:
#Import necessary files
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

In [None]:
#Load the dataset
df_fr=pd.read_csv('/content/fortune1000.csv')

In [None]:
#Display the top 5 rows
df_fr.head()

Unnamed: 0,Rank,Company,Sector,Industry,Location,Revenue,Profits,Employees
0,1,Walmart,Retailing,General Merchandisers,"Bentonville, AR",482130,14694,2300000
1,2,Exxon Mobil,Energy,Petroleum Refining,"Irving, TX",246204,16150,75600
2,3,Apple,Technology,"Computers, Office Equipment","Cupertino, CA",233715,53394,110000
3,4,Berkshire Hathaway,Financials,Insurance: Property and Casualty (Stock),"Omaha, NE",210821,24083,331000
4,5,McKesson,Health Care,Wholesalers: Health Care,"San Francisco, CA",181241,1476,70400


In [None]:
#Display the last 5 rows
df_fr.tail()

Unnamed: 0,Rank,Company,Sector,Industry,Location,Revenue,Profits,Employees
995,996,New York Community Bancorp,Financials,Commercial Banks,"Westbury, NY",1902,-47,3448
996,997,Portland General Electric,Energy,Utilities: Gas and Electric,"Portland, OR",1898,172,2646
997,997,Portland General Electric,Energy,Utilities: Gas and Electric,"Portland, OR",1898,172,2646
998,999,Wendy’s,"Hotels, Resturants & Leisure",Food Services,"Dublin, OH",1896,161,21200
999,1000,Briggs & Stratton,Industrials,Industrial Machinery,"Wauwatosa, WI",1895,46,5480


In [None]:
#Display 20 random sample rows
df_fr.sample(20)

Unnamed: 0,Rank,Company,Sector,Industry,Location,Revenue,Profits,Employees
716,717,W.R. Grace,Chemicals,Chemicals,"Columbia, MD",3052,144,6700
198,199,Aramark,Business Services,Diversified Outsourcing Services,"Philadelphia, PA",14329,236,216500
373,374,Casey’s General Stores,Retailing,Specialty Retailers: Other,"Ankeny, IA",7052,181,22408
999,1000,Briggs & Stratton,Industrials,Industrial Machinery,"Wauwatosa, WI",1895,46,5480
487,488,Simon Property Group,Financials,Real estate,"Indianapolis, IN",5266,1828,4075
411,412,Rockwell Automation,Industrials,"Electronics, Electrical Equip.","Milwaukee, WI",6308,828,22500
136,137,Progressive,Financials,Insurance: Property and Casualty (Stock),"Mayfield Village, OH",20854,1268,28580
747,748,Systemax,Retailing,Specialty Retailers: Other,"Port Washington, NY",2908,-100,3300
219,220,CDW,Technology,Information Technology Services,"Lincolnshire, IL",12989,403,8465
348,349,Newmont Mining,Energy,"Mining, Crude-Oil Production","Greenwood Village, CO",7729,220,15601


In [None]:
#Display the data types of the columns
df_fr.dtypes

Unnamed: 0,0
Rank,int64
Company,object
Sector,object
Industry,object
Location,object
Revenue,int64
Profits,int64
Employees,int64


In [None]:
#Display the information about the dataset
df_fr.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 8 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   Rank       1000 non-null   int64 
 1   Company    1000 non-null   object
 2   Sector     1000 non-null   object
 3   Industry   1000 non-null   object
 4   Location   1000 non-null   object
 5   Revenue    1000 non-null   int64 
 6   Profits    1000 non-null   int64 
 7   Employees  1000 non-null   int64 
dtypes: int64(4), object(4)
memory usage: 62.6+ KB


In [None]:
#Display the dimensions (shape) of the dataset
df_fr.shape

(1000, 8)

In [None]:
#Check whether duplicate rows are present in the data set
df_fr.duplicated().sum()

np.int64(4)

In [None]:
#remove the duplicated rows, if any
df_fr.drop_duplicates(inplace=True)

In [None]:
df_fr.duplicated().sum()

np.int64(0)