## Create a New Database [(tutorial)](https://www.sqlitetutorial.net/sqlite-python/creating-database/)
There are CSV files in the current directory, transform them into SQLite3 databases

In [24]:
import sqlite3
import pandas as pd

con = sqlite3.connect("films.db")
cur = con.cursor()
cur.execute("DROP TABLE IF EXISTS films")
names = ['id', 'title', 'release_year', 'country', 'duration', 'language', 'certification', 'gross', 'budget']
films = pd.read_csv("films.csv", names=names)
films.to_sql('films', con, if_exists='append', index = False, index_label='id', chunksize = 10000)
con.close()


con = sqlite3.connect("people.db")
cur = con.cursor()
cur.execute("DROP TABLE IF EXISTS people")
names = ['id', 'name', 'birthdate', 'deathdate']
films = pd.read_csv("people.csv", names=names)
films.to_sql('people', con, if_exists='append', index = False, index_label='id', chunksize = 10000)
con.close()



## Queries to database people.db

In [36]:
# connect to database
con = sqlite3.connect("people.db")
cur = con.cursor()

COUNT(*) tells you how many records are in a table.  
However, if you want to count the number of non-missing values in a particular field, 
you can call COUNT() on just that field.
### Looking at the differences between the count of unique values, total values, and all records can provide useful insights into your data.

In [26]:
# Count the number of records in the people table
pd.read_sql("""SELECT COUNT(id) as count_records 
               FROM people;""",
            con)


Unnamed: 0,count_records
0,8397


In [37]:
# Count the number of birthdates in the people table
pd.read_sql("""SELECT COUNT(birthdate) as count_birthdate
               FROM people;""",
            con)


Unnamed: 0,count_birthdate
0,6152


In [30]:
con.close()

## Queries to database films.db

In [41]:
# connect to database
con = sqlite3.connect("films.db")
cur = con.cursor()

COUNT(*) tells you how many records are in a table.  
However, if you want to count the number of non-missing values in a particular field, 
you can call COUNT() on just that field.

In [45]:
# Count the languages and countries represented in the films table

"""Looking at the differences between the count of unique values, 
total values, and all records can provide useful insights into your data."""

pd.read_sql("""SELECT COUNT(*) as count_all_records, 
               COUNT(language) as count_languages, 
               COUNT(country) as count_countries
               FROM films;""",
            con)

Unnamed: 0,count_all_records,count_languages,count_countries
0,4968,4957,4966


Often query results will include many duplicate values. You can use the DISTINCT keyword to select the unique values from a field.

In [46]:
# Return the unique countries from the films table

pd.read_sql("""SELECT DISTINCT country 
               FROM films;""",
            con)

Unnamed: 0,country
0,USA
1,Germany
2,Japan
3,Denmark
4,UK
...,...
60,Kenya
61,Slovenia
62,Pakistan
63,Chile


In [48]:
# Count the distinct countries from the films table

pd.read_sql("""SELECT COUNT(DISTINCT country) AS count_distinct_countries
               FROM films;""",
            con)

Unnamed: 0,count_distinct_countries
0,64


In [34]:
con.close()