# File Loader

We need to know certain information about the source files to accurately process them. The two key bits of information are:

* What table the file represents - the 903 returns consist of a number of tables, but we only need the HEADERS and EPISODES tables.
* What year the table represents. Strictly speaking we don't need to know the year, but we need to be able to group the files together, and we need to know the chronological ordering, and year fulfills both of these requirements.

To be able to provide files in a multitude of different scenarios, we have abstracted this to a class called the [DataStore](../csdmpy/datastore/_api.py).

Custom DataStores can be created for custom scenarios, but we provide some default implementations to access files from the filesystem, or from Zip files. 

We use [pyfs](https://pypi.org/project/pyfs/) to allow access to a multitude of filesystems, such as local files, (s)ftp, sharepoint etc. We also provide a
`sample` filesystem to load included sample files. 

In [1]:
from cs_demand_model import fs_datastore

datastore = fs_datastore("sample://v1.zip")

The DataStore allows access to provide files and metadata through the `files` property. 

In [2]:
[f for f in datastore.files]

[DataFile(name='2017/reviews.csv', metadata=Metadata(name='reviews.csv', size=114495, year=2017, table=None)),
 DataFile(name='2017/missing.csv', metadata=Metadata(name='missing.csv', size=162, year=2017, table=None)),
 DataFile(name='2017/header.csv', metadata=Metadata(name='header.csv', size=33527, year=2017, table=None)),
 DataFile(name='2017/oc2.csv', metadata=Metadata(name='oc2.csv', size=26156, year=2017, table=None)),
 DataFile(name='2017/oc3.csv', metadata=Metadata(name='oc3.csv', size=3586, year=2017, table=None)),
 DataFile(name='2017/episodes.csv', metadata=Metadata(name='episodes.csv', size=103508, year=2017, table=None)),
 DataFile(name='2017/placed_for_adoption.csv', metadata=Metadata(name='placed_for_adoption.csv', size=1, year=2017, table=None)),
 DataFile(name='2017/previous_permanence.csv', metadata=Metadata(name='previous_permanence.csv', size=18771, year=2017, table=None)),
 DataFile(name='2017/ad1.csv', metadata=Metadata(name='ad1.csv', size=1, year=2017, table=Non

Files can be opened used the `datastore.open()` method, but often we simply want to read a file as a dataframe:

In [3]:
datastore.to_dataframe("2020/header.csv")

Unnamed: 0,CHILD,SEX,DOB,ETHNIC,UPN,MOTHER,MC_DOB
0,62551,2,18/03/2005,OOTH,R084010198416,,
1,185992,2,29/07/2009,BAFR,U054252063976,,
2,225242,2,27/11/2005,AIND,E024501213825,,
3,158853,2,11/04/2008,WBRI,E043584864318,,
4,53693,1,28/12/2007,MWBA,E072016439902,,
...,...,...,...,...,...,...,...
221,988280,2,09/04/2008,AOTH,P008164917573,,
222,247698,2,18/05/2004,WBRI,L073046772523,,
223,528962,2,10/11/2004,BAFR,T061793193706,,
224,88024,1,05/02/2008,MWAS,S051040920649,,
