# SMARTER database
## Sheep samples
Try to collect sheep *background* samples from SMARTER database:

In [None]:
import pandas as pd

from tskitetude import get_project_dir
from tskitetude.smarterapi import Auth, SheepEndpoint

Connect to *SMARTER* database and retrieve information un *background* samples:

In [None]:
auth = Auth()
sheep_api = SheepEndpoint(auth)

data = sheep_api.get_samples(type="background")
page = 1
df = pd.DataFrame(data["items"])

while data["next"] is not None:
    data = sheep_api.get_samples(page=page+1, type="background")
    df_page = pd.DataFrame(data["items"])
    page = data["page"]
    df = pd.concat([df, df_page], ignore_index=True)

df.info()

Are those all *background* samples?

In [None]:
df.value_counts("type")

Ok. Let's collect all available species:

In [None]:
df.value_counts("species")

Ok, now collect all samples which are *Ovis aries*:

In [None]:
ovis_aries = df[df["species"] == "Ovis aries"]
ovis_aries.head()

How many breeds I have?

In [None]:
ovis_aries.value_counts("breed")

Ensure that there are no *mouflon* in sheep breed names:

In [None]:
ovis_aries["breed"].str.contains("Mouflon", case=False).any()

Ok, now collect *Ovis aries musimon* samples:

In [None]:
ovis_aries_musimon = df[df["species"] == "Ovis aries musimon"]
ovis_aries_musimon.head()

How many breeds I have?

In [None]:
ovis_aries_musimon.value_counts("breed")

Ok, try to collect *European mouflon*:

In [None]:
european_mouflon = ovis_aries_musimon[ovis_aries_musimon["breed"] == "European mouflon"]
european_mouflon.head()

Ok, let's choose `FROA-EUR-000000789` as my *outgroup* sample:

In [None]:
outgroup = european_mouflon[european_mouflon["smarter_id"] == "FROA-EUR-000000789"]
outgroup

Now, create a *sample txt* file which I can use to extract the sample I need from smarter database using plink:

In [None]:
smarter_dataset = pd.concat([outgroup, ovis_aries])
smarter_dataset[["smarter_id", "breed_code"]].to_csv(get_project_dir() / "data/sheep_dataset.tsv", index=False, header=False, sep="\t")