# SDV Exploration
Looking into the open source [SDV library](https://github.com/sdv-dev/SDV) to see if it could be a good option going forward for our use case.

In [12]:
import sdv
import pandas as pd

### Create Tables
Import the covid dataset. Note that SDV allows you to import several datasets together and link via primary keys, something we may find useful later.

In [13]:
df = pd.read_csv('latestdata.csv')
df.head()

  interactivity=interactivity, compiler=compiler, result=result)


Unnamed: 0,ID,age,sex,city,province,country,latitude,longitude,geo_resolution,date_onset_symptoms,...,date_death_or_discharge,notes_for_discussion,location,admin3,admin2,admin1,country_new,admin_id,data_moderator_initials,travel_history_binary
0,000-1-1,,male,Shek Lei,Hong Kong,China,22.365019,114.133808,point,,...,,,Shek Lei,,,Hong Kong,China,8029.0,,
1,000-1-10,78.0,male,Vo Euganeo,Veneto,Italy,45.297748,11.658382,point,,...,22.02.2020,,Vo' Euganeo,,,Veneto,Italy,8954.0,,
2,000-1-100,61.0,female,,,Singapore,1.35346,103.8151,admin0,,...,17.02.2020,,,,,,Singapore,200.0,,
3,000-1-1000,,,Zhengzhou City,Henan,China,34.62931,113.468,admin2,,...,,,,,Zhengzhou City,Henan,China,10091.0,,
4,000-1-10000,,,Pingxiang City,Jiangxi,China,27.51356,113.9029,admin2,,...,,,,,Pingxiang City,Jiangxi,China,7060.0,,


In [14]:
tables = {'patients': df}

### Create Metadata
[Metadata Documentation](https://sdv.dev/SDV/user_guides/relational/relational_metadata.html)

Here we can add constrains to the dataset, which will be worth looking into! [Here](https://github.com/sdv-dev/SDV/blob/master/tutorials/single_table_data/05_Handling_Constraints.ipynb) is a discussion of some example constraints.

In [15]:
metadata = sdv.Metadata()
metadata.add_table(
    name='patients', 
    data=tables['patients'], 
    primary_key='ID')

In [16]:
metadata

Metadata
  root_path: .
  tables: ['patients']
  relationships:

### Fit SDV

In [17]:
sdv = sdv.SDV()

In [18]:
sdv.fit(metadata, tables)

In [None]:
sdv.save('covid_sdv.pkl')

### Sample SDV

In [None]:
loaded_sdv = sdv.SDV.load('covid_sdv.pkl')

In [None]:
samples = loaded_sdv.sample()

### Evaluation
[Evaluation Documentation](https://github.com/sdv-dev/SDV/blob/master/EVALUATION.md)

**Score**: According to the docs, the output will be a maximization score that will indicate how good the modeling was: the higher the value, the more similar the sets of table are. Notice that in most cases the value will be negative.

In [None]:
score = sdv.evaluation.evaluate(samples, tables, metadata)