# Analysis of 'Shark Tank India'

Shark Tank India is a popular reality television show that has captured the attention of audiences across the countr . The show provides a unique platform for budding entrepreneurs to showcase their innovative business ideas to a panel of successful investors, known as "Sharks".

## Format of the Show

The format of Shark Tank India involves CEOs and founders presenting their business ideas to the panel of investors. The entrepreneurs typically seek funding and strategic partnerships to help grow their businesses.

During the pitch, the Sharks have the opportunity to ask questions, provide feedback, and negotiate deals with the entrepreneurs. The goal for the entrepreneurs is to secure funding from one or more of the Sharks, while the Sharks aim to invest in promising businesses that have the potential to generate significant returns.

Overall, Shark Tank India provides an exciting and educational look into the world of entrepreneurship and venture capital, while also offering valuable insights into the strategies and mindset of successful busines leaders.
."

In [85]:
import os, warnings
import pandas as pd
from datetime import datetime 

warnings.simplefilter(
    action='ignore', 
    category=FutureWarning
)

## Loading Raw CSV File

To load a raw CSV file in PythonI usedse the `pandas` library, which provides a convenient `read_csv()` functiodf.head())


In [86]:
raw_file_path = "C:/Users/anirudh.bhadauria/data-analytics/shark-tank-analysis/data/shark-tank-india.csv"
clean_file_path = "C:/Users/anirudh.bhadauria/data-analytics/shark-tank-analysis/data/shark_tank_clean.csv"

In [87]:
tank_df = pd.read_csv(raw_file_path, parse_dates = True)

## Preprocessing Data for Data Analysis

Data preprocessing is an essential step in any data analysis project. It involves cleaning, transforming, and preparing the raw data so that it can be used effectively in subsequent analysis steps. Here are some common data preprocessing tasks for data analysis:

1. **Handling missing values**: Threre are a lot of missing values in this dataset that can cause problems. Removing rows or columns with missing data
2. **Dropping duplicates**: Duplicate values can lead to biased results in data analysis. It's important to identify and remove duplicates from the dataset.
3. **Creating new indicator columns**: Indicator columns for identifying values of some other feild.
4. **Capping outliers**: Outliers can skew the results of data analysis. Capping outliers involves setting a threshold value and replacing any values above or below that threshold with the threshold value.
5. **Formatting columns**: Some columns may need to be formatted before they can be used effectively in data analysis.
6. **Dropping and renaming columns**: BlockingIOErrorDropping and renaming columns can help simplify the dataset and make it easier to work with. work with.d` library: new data.

In [88]:
# Dropping duplicates

tank_df = tank_df.drop_duplicates()

In [89]:
# Dropping columns that we don't need!

tank_df = tank_df.drop(
    columns=[
        "Season Start", "Season End", "Episode Title",
        "Anchor", "Company Website", "EBITDA", 
        "Advisory Shares Equity", "Deal Has Conditions",
        "Royalty Deal"
    ]
)

In [90]:
# Renaming column names

tank_df = tank_df.rename(
    columns = {
        "SKUs": "sku",
        "Industry": "industry",
        "Cash Burn": "cash_burn",
        "Has Patents": "patents",
        "Net Margin": "net_margin",
        "Started in": "year_founded",
        "Original Air Date": "air_date",
        "Pitch Number": "pitch_number",
        "Startup Name": "startup_name",
        "Gross Margin": "gross_margin",
        "Bootstrapped": "finance_type",
        "Total Deal Debt": "deal_debt",
        "Pitchers City": "pitcher_city",
        "Season Number": "season_number",
        "Monthly Sales": "monthly_sales",
        "Debt Interest": "debt_interest",
        "Pitchers State": "pitcher_state",
        "Deal Valuation": "deal_valuation",
        "Episode Number": "episode_number",
        "Male Presenters": "male_presenter",
        "Yearly Revenue": "yearly_revenue",
        "Received Offer": "received_offer",
        "Accepted Offer": "accepted_offer",
        "Total Deal Amount": "deal_amount",
        "Total Deal Equity": "deal_equity",
        "Couple Presenters": "couple_presenter",
        "Female Presenters": "female_presenters",
        "Number of Presenters": "presenter_count",
        "Pitchers Average Age": "pitchers_avg_age",
        "Number of Sharks in Deal": "sharks_in_deal",
        "Original Ask Amount": "original_ask_amount",
        "Transgender Presenters": "trans_presenters",
        "Valuation Requested": "valueation_requested",
        "Business Description": "busniess_description",
        "Original Offered Equity": "original_offered_equity",

        "Namita Present": "namita_present",
        "Namita Debt Amount": "namita_debt_amount",
        "Namita Investment Amount": "namita_investment_amount",
        "Namita Investment Equity": "namita_investment_equtiy",

        "Vineeta Present": "vineeta_present",
        "Vineeta Debt Amount": "vineeta_debt_amount",
        "Vineeta Investment Amount": "vineeta_investment_amount",
        "Vineeta Investment Equity": "vineeta_investment_equtiy",

        "Anupam Present": "anupam_present",
        "Anupam Debt Amount": "anupam_debt_amount",
        "Anupam Investment Amount": "anupam_investment_amount",
        "Anupam Investment Equity": "anupam_investment_equtiy",

        "Aman Present": "aman_present",
        "Aman Debt Amount": "aman_debt_amount",
        "Aman Investment Amount": "aman_investment_amount",
        "Aman Investment Equity": "aman_investment_equtiy",

        "Peyush Present": "peyush_present",
        "Peyush Debt Amount": "peyush_debt_amount",
        "Peyush Investment Amount": "peyush_investment_amount",
        "Peyush Investment Equity": "peyush_investment_equtiy",

        "Amit Present": "amit_present",
        "Amit Debt Amount": "amit_debt_amount",
        "Amit Investment Amount": "amit_investment_amount",
        "Amit Investment Equity": "amit_investment_equtiy",

        "Ashneer Present": "ashneer_present",
        "Ashneer Debt Amount": "ashneer_debt_amount",
        "Ashneer Investment Amount": "ashneer_investment_amount",
        "Ashneer Investment Equity": "ashneer_investment_equtiy",

        "Guest Present": "guest_present",
        "All Guest Names": "all_guest_names",
        "Invested Guest Name": "invested_guest_name",
        "Guest Debt Amount": "guest_debt_amount",
        "Guest Investment Amount": "guest_investment_amount",
        "Guest Investment Equity": "guest_investment_equtiy",
    }
)

In [91]:
# Removing outliers for 'year founded'

tank_df['year_founded'] = tank_df['year_founded'].clip(
    upper = tank_df['year_founded'].quantile(0.98)
)

In [92]:
# Episode number '0' are unseen, unaired 

def process_date(date):
    return datetime.strptime(date, "%d-%b-%y")

# Filling unaired date with a dummy date - '01-Jan-20'
tank_df.loc[
    tank_df['air_date'].isna(), 'air_date'
] = "01-Jan-20"

# Formatting air dates to datetime object
tank_df.loc[
    (~tank_df['air_date'].isna()), 'air_date'
] = tank_df.loc[
    (~tank_df['air_date'].isna()), 'air_date'
].apply(lambda date: process_date(date))

In [93]:
# Formatting 'industry'

tank_df['industry'] = tank_df['industry'].replace(
    {
        "Vehicles/Electrical Vehicles": "Automobile",
        "Beauty/Fashion": "Fashion",
        "Technology/Software": "Technology",
        "Liquor/Beverages": "Beverages",
        "Furnishing/Household": "Household",
        "Medical/Health": "Healthcare",
        "Animal/Pets": "Pet"
    }
)

In [94]:
# Filling out missing 'founded year'

tank_df['year_founded'] = tank_df['year_founded'].fillna(
    tank_df['year_founded'].mean().round()
)

In [95]:
# Filling out 'presenter count'

tank_df['male_presenter'] = tank_df['male_presenter'].fillna(0)
tank_df['female_presenters'] = tank_df['female_presenters'].fillna(0)
tank_df['trans_presenters'] = tank_df['trans_presenters'].fillna(0)
tank_df['couple_presenter'] = tank_df['couple_presenter'].fillna(0)

In [96]:
# Creating a column for identifying pitchers from 'multiple cities'
# Creating a column for identifying multiple guests
# Creating a column for identifying multiple guests investments

tank_df['all_guest_names'] = tank_df['all_guest_names'].fillna("")
tank_df['invested_guest_name'] = tank_df['invested_guest_name'].fillna("")

tank_df['multiple_pitcher_cities'] = tank_df['pitcher_city'].str.contains(",").apply(
        lambda multiple_city: 1 if multiple_city else 0
)
tank_df['multiple_guests'] = tank_df['all_guest_names'].str.contains(",").apply(
        lambda guest: 1 if guest else 0
)
tank_df['multiple_guest_investments'] = tank_df['invested_guest_name'].str.contains(",").apply(
        lambda guest_invested: 1 if guest_invested else 0
)

In [97]:
# Filling Yearly, Monthly revenue, Gross margin, Net margin with '0'
# Maybe these pitchers came with an 'idea' only and not any POC
 
tank_df['yearly_revenue'] = tank_df['yearly_revenue'].fillna(0)
tank_df['monthly_sales'] = tank_df['monthly_sales'].fillna(0)
tank_df['net_margin'] = tank_df['net_margin'].fillna(0)
tank_df['gross_margin'] = tank_df['gross_margin'].fillna(0)

In [98]:
# Filling out 'cash burn', 'patents', 0 or 1

tank_df['cash_burn'] = tank_df['cash_burn'].replace({"yes": 1})
tank_df['cash_burn'] = tank_df['cash_burn'].fillna(0)

tank_df['patents'] = tank_df['patents'].replace({"yes": 1})
tank_df['patents'] = tank_df['patents'].fillna(0)

In [99]:
# Filling out SKUs

tank_df['sku'] = tank_df['sku'].fillna(0)

In [100]:
# Tokenized the finance_type column
# 1- Bootstrapped
# 2- Funded
# 3- Other (Maybe Non profit org. or crowd funded)

tank_df['finance_type'] = tank_df['finance_type'].replace(
    {
        "funded": 2,
        "yes": 1
    }
)
tank_df['finance_type'] = tank_df['finance_type'].fillna(3)

In [101]:
# Filling '0' in neccesary feilds!

tank_df['accepted_offer'] = tank_df['accepted_offer'].fillna(0)
tank_df['deal_amount'] = tank_df['deal_amount'].fillna(0)
tank_df['deal_equity'] = tank_df['deal_equity'].fillna(0)
tank_df['deal_debt'] = tank_df['deal_debt'].fillna(0)
tank_df['debt_interest'] = tank_df['debt_interest'].fillna(0)
tank_df['deal_valuation'] = tank_df['deal_valuation'].fillna(0)

tank_df['sharks_in_deal'] = tank_df['sharks_in_deal'].fillna(0)

In [102]:
# Filling out Shark fields!

tank_df['namita_present'] = tank_df['namita_present'].fillna(0)
tank_df['namita_investment_amount'] = tank_df['namita_investment_amount'].fillna(0)
tank_df['namita_investment_equtiy'] = tank_df['namita_investment_equtiy'].fillna(0)
tank_df['namita_debt_amount'] = tank_df['namita_debt_amount'].fillna(0)

tank_df['vineeta_present'] = tank_df['vineeta_present'].fillna(0)
tank_df['vineeta_investment_amount'] = tank_df['vineeta_investment_amount'].fillna(0)
tank_df['vineeta_investment_equtiy'] = tank_df['vineeta_investment_equtiy'].fillna(0)
tank_df['vineeta_debt_amount'] = tank_df['vineeta_debt_amount'].fillna(0)

tank_df['anupam_present'] = tank_df['anupam_present'].fillna(0)
tank_df['anupam_investment_amount'] = tank_df['anupam_investment_amount'].fillna(0)
tank_df['anupam_investment_equtiy'] = tank_df['anupam_investment_equtiy'].fillna(0)
tank_df['anupam_debt_amount'] = tank_df['anupam_debt_amount'].fillna(0)

tank_df['aman_present'] = tank_df['aman_present'].fillna(0)
tank_df['aman_investment_amount'] = tank_df['aman_investment_amount'].fillna(0)
tank_df['aman_investment_equtiy'] = tank_df['aman_investment_equtiy'].fillna(0)
tank_df['aman_debt_amount'] = tank_df['aman_debt_amount'].fillna(0)

tank_df['peyush_present'] = tank_df['peyush_present'].fillna(0)
tank_df['peyush_investment_amount'] = tank_df['peyush_investment_amount'].fillna(0)
tank_df['peyush_investment_equtiy'] = tank_df['peyush_investment_equtiy'].fillna(0)
tank_df['peyush_debt_amount'] = tank_df['peyush_debt_amount'].fillna(0)

tank_df['amit_present'] = tank_df['amit_present'].fillna(0)
tank_df['amit_investment_amount'] = tank_df['amit_investment_amount'].fillna(0)
tank_df['amit_investment_equtiy'] = tank_df['amit_investment_equtiy'].fillna(0)
tank_df['amit_debt_amount'] = tank_df['amit_debt_amount'].fillna(0)

tank_df['ashneer_present'] = tank_df['ashneer_present'].fillna(0)
tank_df['ashneer_investment_amount'] = tank_df['ashneer_investment_amount'].fillna(0)
tank_df['ashneer_investment_equtiy'] = tank_df['ashneer_investment_equtiy'].fillna(0)
tank_df['ashneer_debt_amount'] = tank_df['ashneer_debt_amount'].fillna(0)

tank_df['guest_present'] = tank_df['guest_present'].fillna(0)
tank_df['guest_investment_amount'] = tank_df['guest_investment_amount'].fillna(0)
tank_df['guest_investment_equtiy'] = tank_df['guest_investment_equtiy'].fillna(0)
tank_df['guest_debt_amount'] = tank_df['guest_debt_amount'].fillna(0)


In [103]:
tank_df.head(2)

Unnamed: 0,season_number,startup_name,episode_number,pitch_number,air_date,industry,busniess_description,year_founded,presenter_count,male_presenter,...,vineeta_present,anupam_present,aman_present,peyush_present,amit_present,ashneer_present,guest_present,multiple_pitcher_cities,multiple_guests,multiple_guest_investments
0,1,BluePineFoods,1,1,2021-12-20 00:00:00,Food,Frozen Momos,2016.0,3,2.0,...,1.0,1.0,1.0,0.0,0.0,1.0,0.0,0,0,0
1,1,BoozScooters,1,2,2021-12-20 00:00:00,Automobile,Renting e-bike for mobility in private spaces,2017.0,1,1.0,...,1.0,1.0,1.0,0.0,0.0,1.0,0.0,0,0,0


## Exporting clean CSV file

After preprocessing the data, it's important to export the cleaned data as a new CSV file. This ensures that the original raw data remains unchanged, and provides a clean dataset for analysis.


In [104]:
if os.path.exists(clean_file_path):
  os.remove(clean_file_path)
    
tank_df.to_csv(clean_file_path, header=True, index=False)