# Intoduction

### ELT Pipeline and Startup Data Analysis

This workbook provides an in-depth analysis of startup data, highlighting success and failure rates across selected countries. Employing the ELT (Extract, Load, Transform) method, the data was first extracted from various sources, loaded into our database, and then transformed to facilitate detailed analysis. SQL queries were utilized to evaluate key metrics such as funding amounts and success rates, calculating success-to-failure ratios rounded to two decimal places.

<div align="center">
  <img src="img/elt_1.png" alt="ELT Pipeline Diagram" style="max-width: 100%; height: auto;">
</div>

The included ELT pipeline diagram visually illustrates this process, showcasing how raw data is systematically processed to derive actionable insights. This structured approach enhances our understanding of startup performance, supports data-driven decision-making, and identifies critical trends.


## Import Required Libraries

In [1]:
import kaggle
import pandas as pd
import sqlalchemy

## STEP-1: Extract

#### Extract Data through Kaggle API

In [2]:
!kaggle datasets download yanmaksi/big-startup-secsees-fail-dataset-from-crunchbase -f "big_startup_secsees_dataset.csv"

Dataset URL: https://www.kaggle.com/datasets/yanmaksi/big-startup-secsees-fail-dataset-from-crunchbase
License(s): Community Data License Agreement - Sharing - Version 1.0
big_startup_secsees_dataset.csv.zip: Skipping, found more recently modified local copy (use --force to force download)


#### Unzip downloaded file

In [3]:
import zipfile
zip_ref = zipfile.ZipFile('big_startup_secsees_dataset.csv.zip')
zip_ref.extractall()
zip_ref.close()

#### Check whether data loaded or not

In [4]:
startup = pd.read_csv("big_startup_secsees_dataset.csv")
startup.head()

Unnamed: 0,permalink,name,homepage_url,category_list,funding_total_usd,status,country_code,state_code,region,city,funding_rounds,founded_at,first_funding_at,last_funding_at
0,/organization/-fame,#fame,http://livfame.com,Media,10000000,operating,IND,16,Mumbai,Mumbai,1,,2015-01-05,2015-01-05
1,/organization/-qounter,:Qounter,http://www.qounter.com,Application Platforms|Real Time|Social Network...,700000,operating,USA,DE,DE - Other,Delaware City,2,2014-09-04,2014-03-01,2014-10-14
2,/organization/-the-one-of-them-inc-,"(THE) ONE of THEM,Inc.",http://oneofthem.jp,Apps|Games|Mobile,3406878,operating,,,,,1,,2014-01-30,2014-01-30
3,/organization/0-6-com,0-6.com,http://www.0-6.com,Curated Web,2000000,operating,CHN,22,Beijing,Beijing,1,2007-01-01,2008-03-19,2008-03-19
4,/organization/004-technologies,004 Technologies,http://004gmbh.de/en/004-interact,Software,-,operating,USA,IL,"Springfield, Illinois",Champaign,1,2010-01-01,2014-07-24,2014-07-24


## STEP-2: Load

#### Load Data to Sql server

#### create a connection with SQL SERVER

In [5]:
from sqlalchemy import create_engine, exc

# Define your connection string
conn_str = (
    'mssql://HAIER-PC\SQLEXPRESS/startup?driver=ODBC+Driver+17+for+SQL+Server'
)

try:
    # Create an engine
    engine = create_engine(conn_str)
    # Test the connection
    conn = engine.connect()
except exc.SQLAlchemyError as e:
    print("Error: Could not make connection to Database")
    print(e)


#### Upload data to SQL SERVER

In [None]:
startup.to_sql('startup_stage', con=engine, if_exists='append', index=False)

## Step-3: Transform

So far, I've completed the extraction and loading phases, and now it's time to transform the data in SQL Server. To achieve this, I've written SQL queries and included them, along with their outputs, in a file named "Transform.md" for easy reference. This file is located in the same folder.
[Click here to check Transform Phase ](https://github.com/Rajaisrarkiani/Startup-Success-and-Failure-Analysis/blob/main/Transform.md)
