# D-vitamin production through Saccharomyces cerevisiae

## 1. Introduction

### 1.1 Literature review of the compound

The product introduced in this project is the fat-soluble (source) vitamin D3, most commonly referred to as simply vitamin D. The vitamin is essential to the intestinal absorption of calcium, magnesium and phosphate. It can be produced by the skin trhough a reaction dependent on UVB light – from sunlight ((Holick 1980).), and is also present in egg yolks and fish  (Brown 1999).

The most popular use of vitamin D is as a supplement, especially for countries in the Northern hemisphere, with long winter-nights without sunlight available . However, deficiency is also widespread in Asia, especially China . More than 10% of Europeans have severe vitamin D deficiency, while the number is for non-severe deficiency is <20% for Northern Europeans and 30-60% for the rest of Europe.

A new usage – while still as supplement – is taking vitamin D as a preventative against infection with COVID-19 (a respiratory infection), which is a pandemic that begun in 2019 and is still ravaging through the world at over one million dead . Some trials have found no impact when taking vitamin D , while other researchers are still conducting experiments to elucidate the matter further . Nonetheless, it is known that vitamin D can alleviate the risk of respiratory infections in general.

While research is, as of yet, inconclusive with regards to vitamin D and its effect on COVID-19, the market has reacted to this increased interest. For example, Danish news agency DR (Danmarks Radio) recently put out an article explaining that the consumption of vitamin D has increased up to 50% since September 2020.

Furthermore, with the increasing wages in Asia, an increase vitamin D supplements is noted, which affects the market increases for vitamin D. the average annual growth rate (CAGR) of vitamin D is around 7% for 2019-2024 . The market is quite competitive, without one dominant player . Instead multiple companies produce and sell vitamin D (Pfizer, GlaxoSmithKline etc) .
While the companies do not release exact information on their production methods, patents suggest a purely chemical production rather than biological (patentWO2001072286A1)
While the industrial production is chemical, as mentioned vitamin D is also produced in the human body, as well as experimentally in Saccharomyces cerevisieae. 


### 1.2 Literature review of the cell factory

Budding yeast (*Saccharomyces cerevisiae*) is one of the most used microorganisms in human history [51 ,52].The first evidence of the use of microorganisms is suspected to involve some kind of yeast, plausibly *S. cerevisiae*, and dates back to Ancient Egypt [53]. Today, this wonderful organism is used in a wide variety of productions, ranging from ethanol to biomass, and small organic compounds to cancer medication [54].

One important difference between *S. cerevisiae* and all used bacterial microorganisms is that yeasts are eukaryotes, which means they have, among other things, cellular compartments. This makes a large difference in terms of complexity, as transport between different compartments needs to be factored in, but it also allows for more complex pathways to be split up between  locations where the local environment might be more hospitable. Furthermore, due to the large usage of the organism, it has been extensively studied, and is popular for usage in the industry, making it a good candidate for the production of vitamins and other biochemicals as well as proteins. [52]
The yeast also has many well-established DNA cassettes, making construction of the strain more efficient [54].

While the organism has many advantages and is widely used, it has some drawbacks as well. Usually, industry and academia has used glucose and disaccharides as substrate and carbon source, however this is neither environmentally sustainable, nor is it cheap. From these perspectives, it would be more advantageous to use a strain which can naturally use cellulose, lignin or xylose as substrate, which are abundant in nature.
Another issue using *S. cerevisieae* can be the unwanted production of ethanol (even under aerobic conditions, dubbed the crabtree effect), since the cell factor will divert the carbon and energy towards production of ethanol rather than the wanted metabolite. [52]

Of course, other microorganisms can be used to produce chemicals, such as prokaryotic organisms or other yeasts. However, as outlined above, *S. cerevisieae* allows for higher complexity – specially when working with more complex molecules such as vitamins and hormones (e.g. insulin). Other yeasts could be used, however *S. cerevisieae* remains the yeast most researched, and is, as mentioned, a model organism. 

Yeast does not naturally have the biosynthetic pathway for D vitamin precursor. However, other researchers have been able to insert the pathway into yeast, with some amount of success. For example, Guo et al. from 2018 [55] has produced 7-dehydrocholesterol through metabolic engineering, by overexpressing genes in the mevalonate pathway as well as introduce the gene for Δ24-dehydrocholesterol reductase from the organism Gallus gallus, a type of bird. The researchers furthermore deleted some genes and introduced specific promoters, leading to a titer of 1.07 g/L of the vitamin D precursor.

[51] Industrial Microbiology | Boundless Microbiology. (n.d.). Retrieved December 1, 2020, from https://courses.lumenlearning.com/boundless-microbiology/chapter/industrial-microbiology/
[52] Kavšček, M., Stražar, M., Curk, T., Natter, K., & Petrovič, U. (2015, June 30). Yeast as a cell factory: Current state and perspectives. Microbial Cell Factories, Vol. 14, p. 94. https://doi.org/10.1186/s12934-015-0281-x
[53] Ltd, N. (2004). h2g2 - The History of Bread Yeast - Edited Entry. Retrieved 1 December 2020, from https://h2g2.com/edited_entry/A2791820
[54] Żymańczyk-Duda, E., Brzezińska-Rodak, M., Klimek-Ochab, M., Duda, M., & Zerka, A. (2017). Yeast as a Versatile Tool in Biotechnology. In Yeast - Industrial Applications. https://doi.org/10.5772/intechopen.70130
[55] Guo, X. J., Xiao, W. H., Wang, Y., Yao, M. D., Zeng, B. X., Liu, H., … Yuan, Y. J. (2018). Metabolic engineering of Saccharomyces cerevisiae for 7-dehydrocholesterol overproduction. Biotechnology for Biofuels, 11(1), 192. https://doi.org/10.1186/s13068-018-1194-9

## 2. Problem definition

In this project, there is a main focus on engineering *S. cerevisaea* as a cell factory to produce a D-vitamin precursor. Currently, D vitamin is mainly produced by extracting the precursor from sheepswool. However, with this method the production would be vegan as well as possibly more optimised and less dependent on external factors such as healthy sheep.

To produce D vitamin precursor in *S. cereviseae*, firstly the relevant genes must be introduced in an already existing GSM model. This must then be optimized in a way that focuses on both biomass production as well as vitamin D precursor production, in such a way that the yield can become industrially feasible or near-feasible. 


## 3. Selection and assessment of existing GSM

For our project, we need a metabolic model of yeast. We have selected four GEMs for the yeast *S. cerevisiae*. All of these will be evaluated using Memote (source) to determine which to use going forward.

The first two models are gathered from BiGG. They are iMM904 [56,57] and iND750 [58,59] , and they are both medium sized models in terms of the number of metabolites and reactions. The next models is Yeast8 [60], which is an exspansive yeast model.  This model was first published in 2019 by Lu et al. and has been updated in an open source manner using GitHub. The last model is ecYeast8 [60], which is an enzyme-constrained variant of Yeast8. This means that it should give more accurate productions with regards to real-life observations

The iMM904 model and iND750 model are available from their respective BiGG pages [56,58]. Yeast8 model files can be found on the projects [GitHub repository](https://github.com/SysBioChalmers/yeast-GEM) and in the Lu et al. paper from 2019 [60]. The version of the ecYeast8 model we are using is sourced from the projects [GitHub repository](https://github.com/SysBioChalmers/GECKO/tree/master/models/prot_constrained/ecYeastGEM_prot)

### 3.1 First validation of the models
The first thing we have to do is to load the models into our repository. The .xml files has been downloaded from the respective sources and collected in the 'data' folder. 

As mentioned in the README, the code will be executed in the adjacent file called '[0-Code_Memote](0-Code_Memote.ipynb)'. From the loading of the models, we can assess the number of metabolites, reactions and compartments. This information is collected in Table 1. 

The models can subsequently be analyzed by Memote to get an evaluation of their quality. The Memote files are gathered in the 'memote' folder and their final scores are also shown in Table 1.


**Table 1: Overview of metabolic models**

| Model | # Metabolites | # Reactions | # Compartments | Memote analysis |
| --- | --- | --- | --- | --- |
| iMM904 | 1226 | 1577 | 8 | 85% |
| iND750 | 1059 | 1266 | 8 | 86% |
| Yeast8 | 2742 | 4058 | 14 | 65% |
| ecYeast8 | 4180 | 8144 | 14 | 16% |


As we can see from the memote analysis, the iND750 has the highest quality with 86% and the ecYeast8 model has the lowest quality with 16%. However, this is not quite fair to compare the ecYeast8 model with the rest as this model has added enzyme constraints. The addition of enzyme constraints aims to make the model more accurate to experimental data, but it does not fit into the criteria of the Memote analysis, which can explain why it has such a lower score.

When we look at which model to use in this project, we want to choose a model that is as accurate to real life as possible, while also being useful and applicable. We have therefore chosen to work with the Yeast8 model, as it has a fine quality but it also repressents a modern GEM with the additions of recent advances in the construction of GEMs.
Furthermore, in the addition of the heterologous pathway in the next section, we will be needing ferricytochrome b5, which does not exist in either the iMM904 or the iND750 models. 

[56] BiGG Model iMM904. (n.d.). Retrieved December 1, 2020, from http://bigg.ucsd.edu/models/iMM904
[57] Mo, M. L., Palsson, B., & Herrgård, M. J. (2009). Connecting extracellular metabolomic measurements to intracellular flux states in yeast. BMC Systems Biology, 3. https://doi.org/10.1186/1752-0509-3-37
[58] BiGG Model iND750. (n.d.). Retrieved December 1, 2020, from http://bigg.ucsd.edu/models/iND750
[59] Duarte, N. C., Herrgård, M. J., & Palsson, B. (2004). Reconstruction and validation of Saccharomyces cerevisiae iND750, a fully compartmentalized genome-scale metabolic model. Genome Research, 14(7), 1298–1309. https://doi.org/10.1101/gr.2250904
[60] Lu, H., Li, F., Sánchez, B. J., Zhu, Z., Li, G., Domenzain, I., … Nielsen, J. (2019). A consensus S. cerevisiae metabolic model Yeast8 and its ecosystem for comprehensively probing cellular metabolism. Nature Communications, 10(1). https://doi.org/10.1038/s41467-019-11581-3


## 4. Computer-Aided Cell Factory Engineering
In the engineering of the cell factory, we will first have to add our heterologous pathway for the production of 7-dehydrocholesterol. Following this, we will explore different strategies for optimization of the cell factory.

### 4.1 Addition of a heterologous pathway

Based on the KEGG database, one way to produce 7-dehydrocholesterol is through the [human biosynthesis pathway](https://www.genome.jp/kegg-bin/show_module?hsa_M00101+1718). Our yeast model can natively produce zymosterol, and from this we can produce 7-dehydrocholesterol. (insert figure?)

To get the pathway from zymosterol to 7-dehydrocholesterol, we will need three reactions: [R07498](https://www.genome.jp/dbget-bin/www_bget?R07498), [R03353](https://www.genome.jp/dbget-bin/www_bget?R03353), and [R07215](https://www.genome.jp/dbget-bin/www_bget?R07215).

The reactions look like the following: 
<ol>
    <li>R07498: Zymosterol + NADPH + H+ <=> 5alpha-Cholest-8-en-3beta-ol + NADP+ </li>
    <li>R03353: Lathosterol <=> 5alpha-Cholest-8-en-3beta-ol </li>
    <li>R07215: Lathosterol + 2 Ferrocytochromeb5 + Oxygen + 2 H+ <=> 7-Dehydrocholesterol + 2 Ferricytochrome b5 + 2 H2O </li>

</ol>
To add these reactions to the model, we will need the metabolites' names in the model. They are as follows:

**Table 2: Metabolites in heterologous pathway**

| Metabolite | Model identifier |
|---|---|
| Zymosterol | s_1569[c] |
| NADPH | s_1212[c] |
| H+ | s_0794[c] |
| NADP+ | s_1207[c] |
| Ferrocytochrome b5  | s_0710[m] |
| Ferricytochrome b5 | s_0803[c] |
| Oxygen | s_1275[c] |
| H2O | s_0709[m] |
| 5alpha-Cholest-8-en-3beta-ol | Undefined |
| Lathosterol | Undefined |
| 7-Dehydrocholesterol | Undefined |


We will now add these to the model in turn, as seen in the code file called '[1-Code_Heterologous-pathway](1-Code_Heterologous-pathway.ipynb)'. 

From this code document, we can see that our model is now capable of producing 7-dehydrocholesterol with a maximal flux through its final reaction of apprximately 0.0412. (units?) This itteration of the model with the added biosynthetic pathway is saved as "yeastGEM_het.xml" (for **het**erologous pathway) and can be found by the other model files in the 'data' folder.

### 4.2 Phenotypic phase plane analysis and knock-outs
Now that our model is able to produce 7-dehydrocholesterol, we are interested in optimizing production. Currently, there is a trade-off between growth and production of our target compound, as seen on Figure 1 below. This means that our yeast would rather like to not produce the compound, as it diminishes its growth. 
To fix this, we will explore some different optimization strategies, starting with select gene knock-outs.

![image info](figures/initial_ppp.png)

**Figure 1: Phenotypic phase plane anlysis of the initial metabolic model**

For the knockouts, we are starting with implementing the knock-outs that have been experimentally tested in XXX et al. (source) The code for this section can be seen in '[2-Code_PPP](2-Code_PPP.ipynb)'.
The genes ERG5, ERG6, and GAL7,1,10 were knocked out in sequence. However, they showed no significant increase in either maximal growth rate or maximal 7-dehydrocholersterol production rate.
The PPP of the tried knockouts can be seen in Figure 2.

![image info](figures/ERG5_knockout.png) 
![image info](figures/ERG6_knockout.png)

![image info](figures/GAL_knockout.png)
![image info](figures/GAL_ERG6_knockout.png)

**Figure 2: Phenotypic phase planes of knockout strains**

### 4.3 Optimization of carbon sources

For the optimization of our cell factory, we can also try to optimize which carbon source our cell factory uses. In the '[3-Code_Carbon-sources](3-Code_Carbon-sources.ipynb)' document, we are looking at all individual carbon sources and are looking at the maximal growth rate and maximal 7-dehydrocholersterol production rate. The resulting scatterplot of the maximal growth rate and production rate can be seen in Figure 3.

![image info](figures/carbon_sources.png)

**Figure 3: Plot showing the maximal growth rate and maximal 7-dehydrocholersterol production rate for all carbon sources in the model**

As mentioned in the code document, sucrose gives double the growth and production as glucose. This is caused by sucrose being a disaccharide that is composed of a glucose and a fructose molecule. 
Therefore, the same result can be achieved by increasing the glucose available in the medium.
However, this model is specificaly glucose limited, and thus it is built around low glucose availability.
We are interested in keeping the model under glucose limiteation, as excess glucose can cause ethanol-production and repress metabolite production. This is part of the so-called crabtree effect (.10.1371/journal.pone.0116942)

Furthermore, glucose is a cheaper substrate than sucrose, which means that that it is favorable in large scale productions. 

For these reasons, we choose to keep the carbon source glucose, at the same availability.

### 4.4 Automatic strategies
Automatic optimization strategies like OptGene (https://www.researchgate.net/publication/277216985_OptGene_a_framework_for_in_silico_metabolic_engineering) and OptKnock (http://cameo.bio/05-predict-gene-knockout-strategies.html?fbclid=IwAR11kL9sGFiZ9sxZhr40AnDwbsNsidMYmcb4JfJnmxXtpiFN8rYlygXs3f4) are  tools that can provide a  starting place for computational engineering of cell factories. However, they can be very time intensive, especially when dealing with a large metabolic model such as the Yeast8 model. 

We have tried to use OptGene to optimize the production of 7-dehydrocholersterol in our GEM. The code for this optimization strategy can be found in '[4-Code_Automatic-optimization](4-Code_Automatic-optimization.ipynb)'.

So far, this method has not yet given any actionable information with regards to the optimization of our cell factory. This might be caused by the large number of genes, reactions, and metabolites in the model. In a genetic algorithm like OptGene, this makes the time required for each itteration quite large, which minimizes the usefulness of the method for large models like ours. 

### 4.5 FSEOF analysis

The technique called flux scanning based on enforced objective flux (FSEOF) can be used to identify reactions whose flux are raised or lowered when the cell factory is forced to maintain a certain growth rate, while producing the compound of interest. The code for our FSEOF analysis can be seen in the document '[5-Code_FSEOF](5-Code_FSEOF.ipynb)'.

Our FSEOF analysis found that there was a large negative correlation beteen phosphate H+ symport and production of 7-dehydrocholesterol, as seen in Figure 4.

![image info](figures/FSEOF.png)

**Figure 4: Plot showing the changes in fluxes with increased forced production of 7-dehydrocholersterol**

Following this, we tried to both knock out the phosphate transport reaction (Figure 5, left) and just limiting the amount of phosphate in the medium (Figure 5, right). From this, we found that a complete knockout did nothing to alter the production pattern as seen in the PPP analysis. However, in the phosphate limitation situation, we see a deviation from the PPP results seen previously. In this graph, we see a dramatic limitation in the production capabilities of 7-dehydrocholesterol, but the topography is clearly distinct from all other PPPs produced so far. 

![image info](figures/phospate_knockout_ppp.png) ![image info](figures/phosphate_minimization_ppp.png)

**Figure 5: Plot showing the PPP of a knockout in the phosphate uptake reaction (left) and a reduction in extracellular phosphate (right).**

### 4.6 Co-factor swap

The co-factors used in the various reactions of the metabolic model can be swapped to leave more room for the synthesis of the target compound. In our case, sterol synthesis requires reductive power from NADPH, which means that a benefit might be seen by shifting other reactions to using NADH instead. 

The code for this section can be seen in the file '[6-Code_Co-factor](6-Code_Co-factor.ipynb)'.

We have tried to exchange NADH to NADPH, but when the algorithms was allowed to run, the resulting itteration of the model was unable to produce biomass or 7-dehyrocholesterol. This strategy was therefore abandonned until furhter advances in our understanding of this method can be made. 

### 4.7 Dynamic FBA

This does not work yet

## 5. Discussion (<500 words)

## 6. Conclusion (<200 words)

## References