# Data and meta-data structure

Tally builds upon the ``pandas`` library. The case data is represented
by a pandas dataframe and each column is a pandas Series. The native
format for Tally to save this data is a parquet file.

Tally also defines its own meta-data describe the data columns
and provide additional information on the underlying structure of the data.

## The case-data dataframe
We can retreive the dataframe with the `data` method.

 


In [2]:
import quantipy as tally
dataset = tally.DataSet("Museums")
dataset.read_quantipy("./data/Example_Museum.json", "./data/Example_Museum.parquet")
dataset.data().head()

Unnamed: 0,id_HDATA,Respondent.Serial,DataCollection.Status,DataCollection.StartTime,DataCollection.FinishTime,DataCollection.RoutingContext,address,age,before,biology,...,rating_ent[{whales}].rating_ent_grid,rating_ent[{mammals}].rating_ent_grid,rating_ent[{minerals}].rating_ent_grid,rating_ent[{ecology}].rating_ent_grid,rating_ent[{botany}].rating_ent_grid,rating_ent[{origin_of_species}].rating_ent_grid,rating_ent[{human_biology}].rating_ent_grid,rating_ent[{evolution}].rating_ent_grid,rating_ent[{wildlife_in_danger}].rating_ent_grid,@1
0,1,1,177;,2002-07-19 12:42:30.999,2002-07-19 14:52:31,186,"124 Dill Hall Lane, Church Ditton",5,9,10,...,48;,49;,51;,48;,48;,51;,51;,51;,51;,1.0
1,2,2,177;,2002-07-19 12:42:30.999,2002-07-19 16:52:31,186,"22 Southbank Road, Hounslow",4,10,10,...,51;,51;,52;,51;,51;,51;,50;,51;,51;,1.0
2,3,3,177;,2002-07-19 12:42:30.999,2002-07-19 18:52:31,186,"Gatehouse, Church Strarmthorpe",4,10,10,...,51;,48;,48;,51;,48;,51;,50;,51;,52;,1.0
3,4,4,177;,2002-07-19 12:42:30.999,2002-07-19 20:52:31,186,"151 Linacre Road, London SE2",4,9,10,...,48;,48;,51;,48;,48;,51;,48;,51;,48;,1.0
4,5,5,177;,2002-07-19 12:42:30.999,2002-07-19 22:52:31,186,"73 Kings Road, North Ormesby",5,9,10,...,52;,51;,52;,51;,51;,51;,50;,52;,51;,1.0


:::{note} 
When examining pandas dataframes, we often use the `pandas.DataFrame.head(n=5)` method which shows the top `n` rows of the dataframe. 
:::

Tally also mimics the pandas `[]` syntax. You can view one or more variable using this bracket syntax.

In [5]:
dataset[['age', 'gender']].head()

Unnamed: 0,age,gender
0,5,23
1,4,24
2,4,24
3,4,23
4,5,24


Tally also implements a meta-data schema to describe the data columns
and provide additional information on the underlying structure of the data.

to feature the ``DataFrame``
and ``Series`` objects in the case data component of its ``DataSet`` object.
Additionally, each ``DataSet`` offers a metadata component to describe the
data columns and provide additional information on the characteristics of the
underlying structure. The metadata document is implemented as a nested ``dict``
and provides the following ``keys`` on its first level:

| element      | contains  | 
----------     |  ----------| 
| ``'type'``	 |   case data type| 
| ``'info'``	 |   info on the source data| 
| ``'lib'``	   | shared use references| 
| ``'columns'``|   info on ``DataFrame`` columns (types, labels, etc.)| 
| ``'sets'``	 |   ordered groups of variables pointing to other parts of the meta| 
| ``'masks'``  |   complex variable type definitions (arrays, dichotomous, etc.)|