# Pandas-Mysql

### Install mysql-connector (run once) 

Restart the kernel (Kernel, Restart) so that it becomes available for importing

In [None]:
!pip3 install mysql-connector

### Make a connection with the mysql server

In [None]:
import mysql.connector as mysql

# connect to the database using 'connect()' method
db = mysql.connect(
    host = "w-util-MySQL.ad.ufl.edu",
    user = "fsoa_student",
    passwd = "FSOAStudent!",
    database="fsoa_impink"
)

# print db variable
db

## Cursor

A 'cursor' allows you to communicate with the mysql database

In [None]:
# get a cursor
cursor=db.cursor()

cursor

### Main cursor methods

- execute: run a query
- fetchone: default method, get one record at the time
- fetchall: get all records
    
See: https://dev.mysql.com/doc/connector-python/en/ and 
        https://www.tutorialspoint.com/python_data_access/python_mysql_cursor_object.htm

### 'InternalError: Unread result found.'

If you run into an 'unread result found' error, you can prevent this by using a 'buffered' cursor.
    
### Buffered Cursor

The reason is that without a buffered cursor, the results are "lazily" loaded, meaning that "fetchone" actually only fetches one row from the full result set of the query. When you will use the same cursor again, it will complain that you still have n-1 results (where n is the result set amount) waiting to be fetched. However, when you use a buffered cursor the connector fetches ALL rows behind the scenes and you just take one from the connector so the mysql db won't complain.

https://stackoverflow.com/questions/29772337/python-mysql-connector-unread-result-found-when-using-fetchone

https://dev.mysql.com/doc/connector-python/en/connector-python-tutorial-cursorbuffered.html

In [None]:
# buffered cursor
cursor = db.cursor(buffered=True)

In [None]:
# run a query to get table names
cursor.execute("SHOW TABLES")

In [None]:
# iterate through results
for table_name in cursor:
    print(table_name)

In [None]:
# rerun query and get all results in one go
# this returns a list of tuples
cursor.execute("SHOW TABLES")
r = cursor.fetchall()
print('result', r)
print('type of result', type(r), 'each item containing a', type(r[0]))

### Loading results into a pandas dataframe

In [None]:
import pandas as pd
df = pd.read_sql("SHOW TABLES", db)
df

### Example

From Compustat Fundamental Annual (funda), get the following variables:
    
- gvkey: firm identifier
- datadate: end of fiscal year, rounded (date)
- fyear: fiscal year (number)
- sich: 4 digit industry code (SIC)
- sale: sales
- ni: net income
- epspi: earnings per share
- at: assets
- prcc_f: end of year stock price
- ceq: book value of equity
- csho: number of shares outstanding
- emp: number of employees

In [None]:
qry = '''
select gvkey, datadate, fyear, sich, sale, ni, epspi, at, prcc_f, ceq, csho, emp
from funda 
where 
    at > 0 
    and fyear >= 2013 
    and fyear <= 2022; 
'''

In [None]:
df_funda = pd.read_sql(qry, db)
len(df_funda)

In [None]:
df_funda.head()

In [None]:
df_funda.info()