## Problem #1: Python Package Exploration 

A. I will be exploring python/anaconda package called openml. 

B. I selected this package because I am interested in using machine learning in my research and I have already been introduced with the machine learning library scikit in Python. I may want to use datasets and tasks from the machine learning site OpenML along with scikit-learn and share my results online. So it would be wise to familiarize myself with this package. 

C. GitHub/openml-python/setup.py -> openml requires Python 3.6 or higher (line 14 and 46). 

D. GitHub/openml-python/setup.py -> openml requires the operating systems POSIX, Unix, MacOS. 

E. GitHub/openml-python/setup.py -> openml's dependencies are liac-arff 2.4.0 or higher, xmltodict, requests, scikit-learn 0.18 or higher, python-dateutil (installed through pandas), pandas 1.0.0 or higher, scipy 0.13.3 or higher, numpy 1.6.2 or higher, minio, and pyarrow. 

F. Class chosen (GitHub/openml-python/tests/test_datasets/test_dataset.py)

    1. The class is called OpenMLClusteringTaskTets(OpenML TaskTest) and it is located from the main GitHub openml page tests folder -> test_tasks -> test_clustering_task.py. In test_clustering_task.py file, the class begins on line 10. 
    
    2. The purpose of this object is to ensure that the openml package is able to correctly identify when it can perform clustering tasks and if the dataset is compatible for such a class. It also makes sure it can download datasets that would have missing values, categoreies, strings, or maybe some datasets that have boolean features. 
    
    3. You can use it to make sure no clustering is done on a test server as well as seeing if you can upload a cluster task without a ground truth. 
    
    4. Yes it does inherit from other classes, which can be found at the top of the file. They are TaskType, TestBase, OpenMLTaskTest, and OpenMLServerException. 
    
    5. It would be nice to create a feature for this class to be able to include a dataset that contained some calculation and to ensure you do not have any wrong calcualations occurring, such as division by 0. 

## Problem #2: SQL Database Creation 



## Part A:

In [44]:
#will use the SQL commands create, insert, and select.
#Below created worm_genome.sqlite database. 
from sqlite3 import connect
connection = connect("worm_genome3.sqlite")

In [45]:
cursor = connection.cursor()

In [46]:
sql = '''SELECT type, name FROM sqlite_master LIMIT 5;'''
cursor.execute(sql)

<sqlite3.Cursor at 0x23853bb0ab0>

In [47]:
sql = '''
SELECT sql
FROM sqlite_master;
'''
cursor.execute(sql)

<sqlite3.Cursor at 0x23853bb0ab0>

In [48]:
#creating the features table below. 
create_features = '''
CREATE TABLE IF NOT EXISTS features 
    (feature_id INTEGER PRIMARY KEY AUTOINCREMENT, 
    seq_id TEXT NOT NULL, 
    source TEXT NOT NULL, 
    type TEXT NOT NULL, 
    start TEXT NOT NULL, 
    end TEXT NOT NULL, 
    score TEXT NOT NULL, 
    strand TEXT NOT NULL, 
    phase TEXT NOT NULL);
'''
try:
    cursor.execute(sql)
except connection.DatabaseError:
    print("Creating the survey_summary table resulted in a database error!")
    connection.rollback() #rollback if you get an error message 
    raise
else:
    connection.commit() #commit if you don't get an error message
finally:
    print("done!")

done!


In [49]:
cursor.execute(create_features)

<sqlite3.Cursor at 0x23853bb0ab0>

In [50]:
connection.commit()

In [51]:
#checking the sqlite_master to make sure the table is present. 
sql = 'SELECT name, type FROM sqlite_master;'
cursor.execute(sql)
cursor.fetchall()

[('features', 'table'),
 ('sqlite_sequence', 'table'),
 ('attributes', 'table'),
 ('type_idx', 'index'),
 ('start_idx', 'index'),
 ('end_idx', 'index'),
 ('feature_id_idx', 'index'),
 ('name_idx', 'index')]

In [52]:
#creating the attributes table below. 
create_attributes = '''
CREATE TABLE IF NOT EXISTS attributes
    (attr_id INTEGER PRIMARY KEY AUTOINCREMENT,
    feature_id INTEGER NOT NULL,
    attr_name TEXT NOT NULL, 
    value TEXT NOT NULL, 
    FOREIGN KEY (feature_id) REFERENCES features (feature_id)
    );
'''
try:
    cursor.execute(sql)
except connection.DatabaseError:
    print("Creating the survey_summary table resulted in a database error!")
    connection.rollback() #rollback if you get an error message 
    raise
else:
    connection.commit() #commit if you don't get an error message
finally:
    print("done!")

done!


In [21]:
cursor.execute(create_attributes)

<sqlite3.Cursor at 0x23853ae4420>

In [22]:
#we have both tables that we just created! #sqlite_sequence is created with autoincrement. 
sql = 'SELECT name, type FROM sqlite_master;'
cursor.execute(sql)
cursor.fetchall()

[('features', 'table'), ('sqlite_sequence', 'table'), ('attributes', 'table')]

In [24]:
sql = '''
CREATE INDEX type_idx
ON features (type);
'''

In [25]:
cursor.execute(sql)

<sqlite3.Cursor at 0x23853ae4420>

In [26]:
connection.commit()

In [27]:
sql = '''
SELECT name, sql
FROM sqlite_master 
WHERE type= "index";
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

name	sql
type_idx	CREATE INDEX type_idx
ON features (type)


In [None]:
# CREATE INDEX indexName ON tableName (columnName)

In [33]:
sql = '''
CREATE INDEX start_idx
ON features (start);
'''
cursor.execute(sql)
connection.commit()

In [35]:
sql = '''
CREATE INDEX end_idx
ON features (end);
'''
cursor.execute(sql)
connection.commit()

In [36]:
sql = '''
CREATE INDEX feature_id_idx
ON attributes (feature_id);
'''
cursor.execute(sql)
connection.commit()

In [38]:
sql = '''
CREATE INDEX name_idx
ON attributes (attr_name);
'''
cursor.execute(sql)
connection.commit()

In [39]:
sql = '''
SELECT name, sql
FROM sqlite_master 
WHERE type= "index";
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

name	sql
type_idx	CREATE INDEX type_idx
ON features (type)
start_idx	CREATE INDEX start_idx
ON features (start)
end_idx	CREATE INDEX end_idx
ON features (end)
feature_id_idx	CREATE INDEX feature_id_idx
ON attributes (feature_id)
name_idx	CREATE INDEX name_idx
ON attributes (attr_name)


In [40]:
def get_header(cursor):
    '''
    Makes a tab delimited header row from the cursor description.
    Arguments:
        cursor: a cursor after a select query
    Returns:
        string: A string consisting of the column names separated by tabs, no new line
    '''
    return '\t'.join([row[0] for row in cursor.description])

In [41]:
def get_results(cursor):
    '''
    Makes a tab delimited table from the cursor results.
    Arguments:
        cursor: a cursor after a select query
    Returns:
        string: A string consisting of the column names separated by tabs, no new line
    ''' 
    res = list()
    for row in cursor.fetchall():        
        res.append('\t'.join(list(map(str,row))))
    return "\n".join(res)

In [43]:
sql = '''
SELECT name FROM sqlite_master;
'''
cursor.execute(sql)
print(get_header(cursor))
print(get_results(cursor))

name
features
sqlite_sequence
attributes
type_idx
start_idx
end_idx
feature_id_idx
name_idx


In [None]:
connec

## Part B: Populate database 