# Introduction


This tutorial will show how to understand and manipilate the `carpyncho` Python Client.

First we need to import the module, and instantiate the client

In [1]:
# import the module
import carpyncho 

# instance the client
client = carpyncho.Carpyncho()

Firsts lets check which tiles have available catalogs to download.

In [2]:
client.list_tiles()

('others',
 'b206',
 'b214',
 'b216',
 'b220',
 'b228',
 'b234',
 'b247',
 'b248',
 'b261',
 'b262',
 'b263',
 'b264',
 'b277',
 'b278',
 'b356',
 'b360',
 'b396')

Well lets asume we are interested in the tile `b216`, so we can check which catalogs are available in this tiles

In [3]:
client.list_catalogs("b216")

('features', 'lc')

Well we see that  catalogs with the light curves (`lc`), and the features of those curves (`features`) are available. 

So for example we now can retrieve more info of any of this catalogs, for simplicity let's check the *b216 lc*

In [4]:
client.catalog_info("b216", "lc")

{'hname': 'Time-Series',
 'format': 'BZIP2-Parquet',
 'extension': '.parquet.bz2',
 'date': '2020-04-14',
 'md5sum': '236e126f82e80684f29247220470b831  lc_obs_b216.parquet.bz2',
 'filename': 'lc_obs_b216.parquet.bz2',
 'driveid': '1C-_3A6almD42ewASe8n74Y355mYn9tZG',
 'size': 369866999,
 'records': 37839384}

The attribute `hname` is a human readable version of the name of the catalog, the next two keys have information of format of the catalog (how is stored in the cloud), next are information about the date of publication of the file, check-sums and the cloud-ID (all of this is mostly for internal use). 

Finally we have the two more important information: `size` is the size in bytes of the file (*352.7 MiB*) and the number of records stored in the file (more than 37 millons).

Ok... to big, lets check the *b278 features* catalog

In [5]:
client.catalog_info("b216", "features")

{'hname': 'Features',
 'format': 'BZIP2-Parquet',
 'extension': '.parquet.bz2',
 'date': '2020-04-14',
 'md5sum': '433aae05541a2f5b191aa95d717fa83c  features_b216.parquet.bz2',
 'filename': 'features_b216.parquet.bz2',
 'driveid': '1-t165sLjn0k507SFeW-A4p9wYVL9rP4B',
 'size': 149073679,
 'records': 334773}

In this case this file is only `142.2 MiB` of size, let's retrive it into a dataframe.

In [6]:
# the first time this can be slow
df = client.get_catalog("b278", "features")
df

b278-features:   0%|          | 0.00/349M [00:00<?, ?B/s]

OSError: Invalid data stream

Well we have a lot of imformation to play here. Let's check if we have some multiple types of sources

In [None]:
df.groupby("vs_type").id.count()

Well 41 RRab stars (and more than 334K of unknow sources)

Well we have a lot to use here, lets make some plots.

Form now on, yo simple have a big pandas dataframe to manipulate.

All the methods of `carpyncho.Carpyncho` client are well documented and you 
can acces it whit the '?' command in Jupyter

In [None]:
client.get_catalog?

In [None]:
import datetime as dt
dt.datetime.now()