Data Loading, Cleaning, and Saving to SQLite

**Objective:** Load the stock dataset, clean it, and save it into a SQLite database for further analysis.

---

## 1. Import Libraries
```python
import pandas as pd
import sqlite3
import numpy as np


In [2]:
# Import Libraries
import pandas as pd
import sqlite3
import numpy as np

In [3]:
# Load the dataset
df = pd.read_csv(r"C:\Users\ACER\Desktop\Portfolios\Data Analysis\google_stock_analysis\GOOG.csv")

In [4]:
# Check for few first rows of the dataset
df.head()

Unnamed: 0,symbol,date,close,high,low,open,volume
0,GOOG,2016-06-14 00:00:00+00:00,718.27,722.47,713.12,716.48,1306065
1,GOOG,2016-06-15 00:00:00+00:00,718.92,722.98,717.31,719.0,1214517
2,GOOG,2016-06-16 00:00:00+00:00,710.36,716.65,703.26,714.91,1982471
3,GOOG,2016-06-17 00:00:00+00:00,691.72,708.82,688.4515,708.65,3402357
4,GOOG,2016-06-20 00:00:00+00:00,693.71,702.48,693.41,698.77,2082538


In [5]:
# Check data types
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1258 entries, 0 to 1257
Data columns (total 7 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   symbol  1258 non-null   object 
 1   date    1258 non-null   object 
 2   close   1258 non-null   float64
 3   high    1258 non-null   float64
 4   low     1258 non-null   float64
 5   open    1258 non-null   float64
 6   volume  1258 non-null   int64  
dtypes: float64(4), int64(1), object(2)
memory usage: 68.9+ KB


In [6]:
# Check for missing values
df.isnull().sum()

symbol    0
date      0
close     0
high      0
low       0
open      0
volume    0
dtype: int64

In [7]:
# check for duplicates
df.duplicated().sum()

np.int64(0)

In [8]:
# Convert date to datetime
df['date'] = pd.to_datetime(df['date']).dt.date
df.head()

Unnamed: 0,symbol,date,close,high,low,open,volume
0,GOOG,2016-06-14,718.27,722.47,713.12,716.48,1306065
1,GOOG,2016-06-15,718.92,722.98,717.31,719.0,1214517
2,GOOG,2016-06-16,710.36,716.65,703.26,714.91,1982471
3,GOOG,2016-06-17,691.72,708.82,688.4515,708.65,3402357
4,GOOG,2016-06-20,693.71,702.48,693.41,698.77,2082538


In [9]:
# Save the cleaned data to a SQLite database
conn = sqlite3.connect("stock_data.db")

In [10]:
# Create a table in the stock_data.db database
df.to_sql("google_stock", conn, if_exists="replace", index=False)

1258

In [11]:
# Comfirm if the table exists and query first 5 rows
table = pd.read_sql("SELECT * FROM google_stock LIMIT 5", conn)
print(table)

  symbol        date   close    high       low    open   volume
0   GOOG  2016-06-14  718.27  722.47  713.1200  716.48  1306065
1   GOOG  2016-06-15  718.92  722.98  717.3100  719.00  1214517
2   GOOG  2016-06-16  710.36  716.65  703.2600  714.91  1982471
3   GOOG  2016-06-17  691.72  708.82  688.4515  708.65  3402357
4   GOOG  2016-06-20  693.71  702.48  693.4100  698.77  2082538
