# GSoC Project Proposal
##  Add Variational Inference Interface to PyMC4
`Variational Inference` is a powerful algorithm for approximating the posterior distribution by conditional probability. Generally, VI aims at minimising `KL divergence`(relative entropy) for posterior appromixation. But KL divergence suffers from many statistical disadvantages as described in [OPVI](https://arxiv.org/abs/1610.09033)(Operator Variational Inference) paper. This project plans to implement OPVI framework in `PyMC4` using `tf` and `tfp`. This framework aims at optimizing `operator objectives` which do not explicitly depend on approximating density.

## Interface Design
The interface design refers the implementation of OPVI framework in `PyMC3` and use of `tf` or `tfp`.


Any operator objective has three components - an operator, a family of test functions, and a distance function.
In objective, operator and test function combines
$
\begin{align}
(O^{p,q}f_{\theta})(z)
\end{align}
$
 in an expectation such that values close to 0 indicates that q(z) is close to p. The below class will implement the same - 
```python
class Operator:
    
    def apply(self):
        '''
        Logic to write operator in terms of p(x, z) and 
        q(z) using symbolic PyMC. 
        
        TODO: Need to figure out this by referring 
        symbolic PyMC guidelines.
        '''
```

Test function maps realizations of latent variables to real vectors. The Stein Operator is implemented with Memoization.
```python
class TestFunction:
    '''
    TODO: figure out what WithMemoization is? and equations written in RBF Kernel
    '''
```

The step function aims at solving a min-max problem 
$
\begin{align}
\mathbf{\lambda^{\*}} = \inf_{\lambda} \sup_{\theta} t(\mathbb{E}_{\lambda}[(O^{p,q}f_{\theta})(z)])
\end{align}
$
```python
class ObjectiveFunction:
    
    def step_function(self):
        '''
        Step function called at each optimization step.
        Plan to take gradients using `tf.gradients` and updates using 
        `tf.keras.optimizers`
        
        TODO: PyMC3 uses SharedVariable concept from theano to take gradients. 
        Try to correlate this with `tf.Variable`     
        '''
```

To wrap variables for approximations
```python
class Appromixation:
    '''
    TODO: Understand Groups, Mean Field Approximation for ADVI and Full Rank 
    Approximation for FR-ADVI. 
    '''
```

To build objective function
```python
class Inference:
    '''
    Base class of ADVI and FR-ADVI to fit the model. 
    
    TODO: Read the ADVI paper and understand the equations
    '''
    pass
```



## Project Timeline
 - Pre GSoC Period: Feb 25 - March 15 <br/>
Complete the TODO's mentioned in Interface Design. Most of the functionality of Variational Inference revolves around Symbolic PyMC. It will take time for me to understand its complete usage. I will make design less and less abstract as the time progresses and submit the proposal.


 - Student Application and Review Application Period: March 16 - April 27 <br />
I will get myself more familiar with codebase of PyMC3 and begin implementing few base classes.


 - Community Bonding Period: April 27 - May 18 <br/> 
I will finalize implementation design for Variational Inference interface by seeking reviews from mentors and community and also referring to OPVI and ADVI paper.


 - Week 1 - 2: May 18 - May 31 <br/>
The actual coding begins. By this time, I will have a fair idea of working with Groups. Implementation of Groups(MeanField, FullRank) and Approximation classes will be done in this interval. Rigorous testing will be done by PyTest module.


 - Week 3 - 4: June 1 - June 14 <br/>
During this interval, I will implement Operator, Objective Function and Test Function classes. Same as above, for every functionality, I will write test cases. Also I will create a progress report for phase 1 evaluations.


 - Evaluation Phase 1: June 15 - June 19 <br/>
At this time, my focus will be on writing visual illustrations/notebooks for use cases of these base classes.


 - Week 5 - 6: June 20 - July 3 <br/>
Having implemented base classes, I will write inference and ADVI appromixation classes. Again testing by PyTest.


 - Week 7: July 4 - July 12 <br/>
Implementation of Full Rank ADVI will be done in this time interval. Designing progress reports.


 - Evaluation Phase 2: July 13 - July 17 <br/>
Again, writing notebooks/blogs to explain good use cases.


 - Week 8 - 9: July 18 - July 31 <br />
I am not sure if I will able to complete implementation of Full Rank ADVI. But definitely I will keep trying to finish and complete the testing and documentation.


 - Week 10: August 1 - August 9 <br />
Writing illustrations/blogs.


 - Final Submission of Code: August 10


 - Post GSoC Period: August 11 onwards <br/>
I will learn and implement SVGD, ASVGD inference algorithms and become a permanent contributor to the organisation. 

## Contributing to PyMC4
1. Pull Request [#220](https://github.com/pymc-devs/pymc4/pull/220) (Merged): Add AutoRegressive distribution - <br/>
This PR added Auto Regressive distribution by wrapping `sts.AutoRegressive` Model. The main task was to call `make_state_space_model` method with suitable arguments to capture the underlying the `tfd.LinearGaussianStateSpaceModel`. It took a lot of debugging to make this AR class compatible with PyMC4.

2. Pull Request [#215](https://github.com/pymc-devs/pymc4/pull/215) (Merged): Add default transform(sigmoid) for Unit Continuous Distribution - <br/>
This PR added sigmoid transform to Unit Continuous Distribution. To make the default transform compatible with PyMC4, I also added Sigmoid transform that used `tfb.Sigmoid` bijector.

3. Pull Request [#212](https://github.com/pymc-devs/pymc4/pull/212) (Merged): Update design_guide notebook - <br/>
This small PR fixed typos and variable names in `pymc4_design_guide.ipynb`.

4. Issue [#211](https://github.com/pymc-devs/pymc4/issues/211) (Closed): Installation issues
I encountered installation issues while setting up the working environment using pip. So, I created the issue and Luciano Paz helped me out with other ways of installing PyMC4.

## Personal Projects
1. Send to S3 - [Github](https://github.com/Sayam753/SendToS3) <br/>
This python project sends backup files to AWS S3 bucket using Boto3. Searching for files is done by regex and results of logs are sent to email using smtplib.

2. Osint-Spy - [Github](https://github.com/Sayam753/OSINT-SPY) <br/>
This Python project performs Osint scan on email, domain, ip, organization, etc.
This information can be used by Data Miners or Penetration Testers in order to find deep information about their target.

3. Turbofan Degradation - [Colab](https://colab.research.google.com/drive/1sCZcJSmRarYbQKDYeaqiLnzXyzFolRC0) <br/>
Implemented a Deep learning based Encoder-Decoder model ([paper](https://www.researchgate.net/publication/336150924_A_Novel_Deep_Learning-Based_Encoder-Decoder_Model_for_Remaining_Useful_Life_Prediction)) for analysing the turbofan degradation dataset provided by NASA.

4. Neural Network from Scratch - [Colab](https://colab.research.google.com/drive/1iU38tTeEvUI_sjt6vVAuhedMWOPUdr5E) <br/>
Implemented a deep neural network from scratch in numpy with custom hyperparameters.



## Basic/Contact Information
I am Sayam Kumar from Indian Institute of Information Technology Sri City, India. I am a second year Undergraduate pursuing a Bachelor's in Computer Science Engineering. I have exposure to programming since a long time and I am interested to work on this project to expand my knowledge in Machine Learning and Bayesian Statistics. Also this is my first time participating in GSoC. As I have no other work planned for summers, I can spend 60~70 hours per week working on the project. Along the way, I will design progress reports and extensive documentation of the implementation of various classes. This will help in submitting reports to mentor and Google at evaluation time.<br/>
Resume - [Google drive link](https://drive.google.com/file/d/1mrNC3qtieWKH1i2mhqH6xiFCt-EwGJ0b/view?usp=sharing), [Github link](https://github.com/Sayam753/Resume) <br/>
Contact details - [Gmail](sayamkumar049@gmail.com), [Yahoo](sayamkumar753@yahoo.in), [Github](https://github.com/Sayam753), [LinkedIn](https://www.linkedin.com/in/sayam049/), [Twitter](https://twitter.com/sayamkumar753), +91 9815247310 (Mobile)
