# Particle Filter using external data
    author: P. Ternes

In this text we track changes made in the Particle Filter code to perform data assimilation using external data. After explaining the changes made to the code, an experiment with particle filter using external data is presented.

The particle filter **code** that uses external data can be obtained [`here`](../../stationsim/particle_filter_gcs.py).

A **notebook** with more information about the Particle Filter can be found [`here`](../pf_experiments/pf_experiments_plots.ipynb).

## Necessary files

This version of the particle filter uses data external to the code. For this code to work correctly, it is necessary to standardize the external data according to the instructions below.
* create a folder to store the external data;
* create a file named <b>activation.dat</b> inside this folder;
* create <b>N</b> files named <b>frame_i.dat</b> inside this folder, where i varies from 1 to the N (maximum number of frames observed);

Below you will find a detailed description of how each of these files should be organized.

### activation.dat file

The activation.dat file contains information about the pedestrian. The informations are:
* <b>pedestrianID:</b> one unique ID that identifies the pedestrian. Integer number;
* <b>time_activation:</b> the time that each pedestrian enters the environment through any gate. Real number;
* <b>gate_in:</b> the gate ID through which the pedestrian enters the environment. Integer number;
* <b>gate_out:</b> the gate ID through which the pedestrian leaves the environment. Integer number;
* <b>speed:</b> the average speed of the pedestrian. Real number;

The first line of the file is a comment line beginning with <b>#</b> and followed by the header.
The following lines contain the information listed above, separated only by space and in the sequence mentioned. If you do not have some information, you must keep a specific column with <i>None</i>.

The file must have the structure represented below:


|# pedestrianID | time_activation | gate_in | gate_out | speed  |
|:--------------|:----------------|:--------|:---------|:-------|
|0              |24.33457         |6        |2         |1.7377  |
|1              |13.3245          |8        |4         |0.31979 |
|$\vdots$       |$\vdots$         |$\vdots$ |$\vdots$  |$\vdots$|

### frame_i.dat

The frame_i.dat file contains information about each active pedestrian's position in the i-th frame. The informations are:
* <b>pedestrianID:</b> one unique ID that identifies the pedestrian. The same ID used in the activation.dat file. Integer number;
* <b>x:</b> the pedestrian's $x$ position in the i-th frame. Real number;
* <b>y:</b> the pedestrian's $y$ position in the i-th frame. Real number;

The first line of the file is a comment line beginning with <b>#</b> and followed by the header.
The following lines contain the information listed above, separated only by space and in the sequence mentioned. Only active pedestrian are listed in each frame_i.dat file. If there is no active pedestrian in the i-th frame, save the file with the header comment only.

The file must have the structure represented below:

| #pedestrianID | x       | y      |
|:--------------|:--------|:-------|
|55             |198.872  |124.2976|
|58             |168.27   |13.1725 |
|$\vdots$       |$\vdots$ |$\vdots$|

As data assimilation is not carried out over the entire time step, it is not necessary to create files for all frames. The files really needed are the frames where data assimilation occurs and are related to the parameter <b>resample_window</b>. For example, if resample_window = 100, it is necessary the files: frame_100.dat, frame_200.dat, etc.

## Activation

To use external data is necessary pass to the particle filter a parameter with the key <b>'external_data'</b> and the value <b><i>True</i></b>. It is also necessary to fill <b>'external_info'</b> list with the data directory and <b><i>booleans</i></b> for the use of speed and the exit gate (in this order).

the E.g. like this:

In [None]:
model_params = {'external_data': True,
               'external_info': ['external_data_dir/', True, True] #[data dir, Use external velocit?, Use external gate_out?]
               }

To use pseudo-truth data, pass the value <i>False</i> for the key <i>'external_data'</i>.

## Initial conditions

After create the base_model object inside the particle filter it is necessary to give the desired initial condition for each agent. To do that, we create the <b>set_initial_conditions()</b> method, that uses the <b>external_data_dir/activation.dat</b> file:

In [None]:
def set_initial_conditions(self):
    '''
     To use external file to determine some agents parameters values;
     self.external_info[0]: directory name
     self.external_info[1]: boolean to use speed
     self.external_info[2]: boolean to use gate_out
    '''
    
    file_name = self.external_info[0] + 'activation.dat'
    ID, time, gateIn, gateOut, speed_ = np.loadtxt(file_name,unpack=True)
    for i in range(self.base_model.pop_total):
        self.base_model.agents[i].steps_activate = time[i]
        self.estimate_model.agents[i].step_start = time[i]
        self.base_model.agents[i].gate_in = int(gateIn[i])
        for model in self.models:
            model.agents[i].steps_activate = time[i]
            model.agents[i].gate_in = int(gateIn[i])
        if self.external_info[1]:
            self.base_model.agents[i].speed = speed_[i]
            for model in self.models:
                model.agents[i].speed = speed_[i]
        if self.external_info[2]:
            self.base_model.agents[i].loc_desire = self.base_model.agents[i].set_agent_location(int(gateOut[i]))
            for model in self.models:
                model.agents[i].loc_desire = self.base_model.agents[i].loc_desire

    '''
     If the speed is not obteined from external data, generate new speeds
     for all agents in all particles.
    '''
    if not self.external_info[1]:
        for model in self.models:
            for agent in model.agents:
                speed_max = 0
                while speed_max <= model.speed_min:
                    speed_max = np.random.normal(model.speed_mean, model.speed_std)
                agent.speeds = np.arange(speed_max, model.speed_min, - model.speed_step)
                agent.speed = np.random.choice((agent.speeds))

    '''
     If the gate_out is not obteined from external data, generate new 
     gate_out for all agents in all particles.
    '''
    if not self.external_info[2]:
        for model in self.models:
            for agent in model.agents:
                agent.set_gate_out()
                agent.loc_desire = agent.set_agent_location(agent.gate_out)

## Predict

In the <b>predict</b> method the particles state are determined. 

In this method is also determined the state of the base_model object. To use external data, this method has been rewritten:


In [None]:
def predict(self, numiter=1):
    '''
    Predict

    DESCRIPTION
    Increment time. Step the base model. Use a multiprocessing method to step
    particle models, set the particle states as the agent
    locations with some added noise, and reassign the
    locations of the particle agents using the new particle
    states. We extract the models and states from the stepped
    particles variable.

    :param numiter: The number of iterations to step (usually either 1, or the  resample window
    '''

    time = self.time - numiter

    if self.do_external_data:
        for i in range(numiter):
            time = time + 1
            file_name = self.external_info[0] + 'frame_' + str(time)+ '.0.dat'
            try:
                agentID, x, y = np.loadtxt(file_name,unpack=True)
                j = 0
                for agent in self.base_model.agents:
                    if (agent.unique_id in agentID):

                        agent.status = 1
                        agent.location = [x[j], y[j]]
                        j += 1
                    elif (agent.status == 1):
                        agent.status = 2
            except TypeError:
                '''
                This error occurs when only one agent is active. In
                this case, the data is read as a float instead of an
                array.
                '''
                for agent in self.base_model.agents:
                    if (agent.unique_id == agentID):
                        agent.status = 1
                        agent.location = [x, y]
                    elif (agent.status == 1):
                        agent.status = 2
            except ValueError:
                '''
                 This error occurs when there is no active agent in
                 the frame.
                 - Deactivate all active agents.
                '''
                for agent in self.base_model.agents:
                    if (agent.status == 1):
                        agent.status = 2

            except OSError:
                '''
                This error occurs when there is no external file to
                read. It should only occur at the end of the simulation.
                - Deactivate all agent.
                '''
                for agent in self.base_model.agents:
                    agent.status = 2

    else:
        for i in range(numiter):
            self.base_model.step()

    stepped_particles = self.pool.starmap(ParticleFilter.step_particle, list(zip( \
        range(self.number_of_particles),  # Particle numbers (in integer)
        [m for m in self.models],  # Associated Models (a Model object)
        [numiter] * self.number_of_particles,  # Number of iterations to step each particle (an integer)
        [self.particle_std] * self.number_of_particles,  # Particle std (for adding noise) (a float)
        [s.shape for s in self.states],  # Shape (for adding noise) (a tuple)
    )))

    self.models = [stepped_particles[i][0] for i in range(len(stepped_particles))]
    self.states = np.array([stepped_particles[i][1] for i in range(len(stepped_particles))])
    self.get_state_estimate()

### Warning!
Note that there is different exceptions for the files reading. This is usefull since we need to read files with different shapes. The drawback of this approach is that if the files are not organized in the correct way, the code will not report a possible error.

# Experiments

Below, some instructions and experiments using external data.

## Inicialization

Determine the path for the model and the filter.


In [None]:
import sys
sys.path.append('../../stationsim')
from particle_filter_gcs import ParticleFilter
from stationsim_gcs_model import Model


Define the parameters necessary to initialize the model.

In [None]:
model_params = {'pop_total': 274,
                'batch_iterations': 3100,
                'step_limit': 3100,
                'birth_rate': 25./15,
                'do_history': False,
                'do_print': False,
                'station': 'Grand_Central'}

Define the parameters necessary to initialize the Particle Filter. Different experiments have different sets of parameters.

In [None]:
filter_params = {'number_of_runs': 1,
                 'particle_std': 1.0,
                 'model_std': 1.0,
                 'do_save': True,
                 'plot_save': False,
                 'agents_to_visualise': 1,
                 'do_ani': False,
                 'show_ani': False,
                 'do_external_data': True,
                 'resample_window': 100,
                 'number_of_particles': 5000,
                 'multi_step': False, # False for plot distance as function of pedestrian
                 'do_resample': True, # True for experiments with D.A.
                 'pf_method': 'sir', # ('sir' or 'hybrid') important if do_resample is True. 
                 'external_info': ['../GCT_final_real_data/', False, False]}

Create and run the Particle Filter object

In [None]:
# Create the Particle Filter object
pf = ParticleFilter(Model, model_params, filter_params)

# Run the particle filter
result = pf.step()
pf.pool.close()

## Results

Use the appropriate method to generate the desired results.
For the result bellow, we used the <b>get_distace_plot</b> method defined inside the <b>estimate_model</b> object.


In [None]:
numiter = 1
if pf.multi_step:
    numiter = pf.resample_window

pf.estimate_model.get_distace_plot(filter_params['external_info'][0]+'frame_', 1500, 3000, numiter)

### SIR Particle Filter

In this experiment we used real data to perform the Data Assimilation using the SIR PF method. To do this, we use exactly the set of parameters presented above. The result is:

![Experiment_SIR-PF_GCS](figs/Fig10.png)

### More results

You can find more results in this [`paper`](update with stationsim_gcs paper).