# Tech Company Funding 


* <b>Company</b>: The name of the Company
* <b>Website</b>: The web site of the company
* <b>Region</b>: The country of the company 
* <b>Vertical</b>: The Industry
* <b>Funding Amount (USD)</b>: The amount of the Funding
* <b>Funding Stage</b>: The stage of the Funding 
* <b>Funding Date</b>: The date of the Funding

### Import Data and Libraries

In [276]:
import pandas as pd


In [277]:
techFundingDf = pd.read_csv('./tech_fundings.csv')
techFundingDf.head()

Unnamed: 0,index,Company,Website,Region,Vertical,Funding Amount (USD),Funding Stage,Funding Date
0,1,Internxt,https://internxt.com/,Spain,Blockchain,278940,Seed,20-Jan
1,2,Dockflow,https://dockflow.com,Belgium,Logistics,292244,Seed,20-Jan
2,3,api.video,https://api.video,France,Developer APIs,300000,Seed,20-Jan
3,4,Buck.ai,https://buck.ai/,United States,Artificial Intelligence,300000,Seed,20-Jan
4,5,Prodsight,https://www.prodsight.ai,United Kingdom,Artificial Intelligence,529013,Seed,20-Jan


In [278]:
techFundingDf.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3575 entries, 0 to 3574
Data columns (total 8 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   index                 3575 non-null   int64 
 1   Company               3575 non-null   object
 2   Website               3575 non-null   object
 3   Region                3563 non-null   object
 4   Vertical              3575 non-null   object
 5   Funding Amount (USD)  3575 non-null   object
 6   Funding Stage         3575 non-null   object
 7   Funding Date          3575 non-null   object
dtypes: int64(1), object(7)
memory usage: 223.6+ KB


### Remove the missing 'Region' values

In [279]:
# Get a copy of DataFrame without missing 'Region' values
condition = techFundingDf['Region'].notnull()
techFundingDf = techFundingDf[ condition ]
techFundingDf.head()

Unnamed: 0,index,Company,Website,Region,Vertical,Funding Amount (USD),Funding Stage,Funding Date
0,1,Internxt,https://internxt.com/,Spain,Blockchain,278940,Seed,20-Jan
1,2,Dockflow,https://dockflow.com,Belgium,Logistics,292244,Seed,20-Jan
2,3,api.video,https://api.video,France,Developer APIs,300000,Seed,20-Jan
3,4,Buck.ai,https://buck.ai/,United States,Artificial Intelligence,300000,Seed,20-Jan
4,5,Prodsight,https://www.prodsight.ai,United Kingdom,Artificial Intelligence,529013,Seed,20-Jan


In [280]:
techFundingDf.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 3563 entries, 0 to 3574
Data columns (total 8 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   index                 3563 non-null   int64 
 1   Company               3563 non-null   object
 2   Website               3563 non-null   object
 3   Region                3563 non-null   object
 4   Vertical              3563 non-null   object
 5   Funding Amount (USD)  3563 non-null   object
 6   Funding Stage         3563 non-null   object
 7   Funding Date          3563 non-null   object
dtypes: int64(1), object(7)
memory usage: 250.5+ KB


### Prepare the 'Funding Amount (USD)'

In [281]:
# find the numbers of 'Unknown' in the Funding Amount
len(techFundingDf[ techFundingDf['Funding Amount (USD)'] == 'Unknown' ]['Funding Amount (USD)'])

9

In [282]:
# Removing the 'Unknown' rows from the Funding amount
techFundingDf = techFundingDf.drop([1262,1263,1264,1284,1285,1293,1301,1310,1327], axis=0)
techFundingDf.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 3554 entries, 0 to 3574
Data columns (total 8 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   index                 3554 non-null   int64 
 1   Company               3554 non-null   object
 2   Website               3554 non-null   object
 3   Region                3554 non-null   object
 4   Vertical              3554 non-null   object
 5   Funding Amount (USD)  3554 non-null   object
 6   Funding Stage         3554 non-null   object
 7   Funding Date          3554 non-null   object
dtypes: int64(1), object(7)
memory usage: 249.9+ KB


In [283]:
len(techFundingDf[ techFundingDf['Funding Amount (USD)'] == 'Unknown' ]['Funding Amount (USD)'])

0

In [284]:
# Change the 'Funding Amount (USD)' data type from object to float64
techFundingDf['Funding Amount (USD)'] = techFundingDf['Funding Amount (USD)'].astype(str).astype(float)
print(techFundingDf.dtypes)

index                     int64
Company                  object
Website                  object
Region                   object
Vertical                 object
Funding Amount (USD)    float64
Funding Stage            object
Funding Date             object
dtype: object


In [285]:
techFundingDf.describe()

Unnamed: 0,index,Funding Amount (USD)
count,3554.0,3554.0
mean,1785.603545,57742670.0
std,1031.960588,298684100.0
min,1.0,40000.0
25%,890.25,5000000.0
50%,1787.5,15508300.0
75%,2678.75,50000000.0
max,3575.0,16600000000.0


### Removing the Irrelevent Features 

In [286]:
irreleventFeatures = ['index','Website','Funding Stage','Funding Date']
techFundingDf = techFundingDf.drop(irreleventFeatures,axis=1,inplace=False)
techFundingDf.info()


<class 'pandas.core.frame.DataFrame'>
Int64Index: 3554 entries, 0 to 3574
Data columns (total 4 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   Company               3554 non-null   object 
 1   Region                3554 non-null   object 
 2   Vertical              3554 non-null   object 
 3   Funding Amount (USD)  3554 non-null   float64
dtypes: float64(1), object(3)
memory usage: 138.8+ KB
