# [Title]

## 1. Introduction

### 1.1 Literature review of Artemisinin (<500 words)
Artemisinin and its derivatives are isolated from the plant sweet wormwood (_Artemisia annua_), although some other plants also have production potential [3]. Artemisinin plays a core role in the fight against malaria, and is recommended for use in combination with a partner drug. The role of artemisinin (or its derivatives) in this combination therapy is to minimise the _Plasmodium_ (parasite conferring malaria) biomass in the blood of infected patients [1].

Artemisinin (as seen in Figure 1) is a sesquiterpene lactone; it consists of three terpene units and further contains an ester that is part of a carbon ring structure. The exact mode-of-action of artemisinin is still debated, but it is thought to be connected to the endoperoxide bridge [3,4]. A two-step mode of action has been suggested. Firstly, activation of artemisinin (or derivatives) by iron, either from heme or in molecular form, causes formation of free radicals and alkylating intermediates. Secondly, reaction between these newly formed free radical species and membrane–bound proteins specifically associated with malaria, confers the anti-malarial function. [5].

![Artemisinin](figures/Artemisinin.png)\
Figure 1

Production of artemisinin in the native host is derived from the general terpenoid biosynthesis. Farnesyl diphosphate (FPP) is converted to amorpha-4,11-diene, which is the substrate of a P-450 enzyme that oxidises the compound through several steps. The final step in the pathway, converting artemisinic acid to artemisinin, is non-enzymatic, and occurs through a spontaneous reaction catalysed by UV light and oxygen [6]. The biosynthetic pathway can be seen in Figure 2.

![Pathway](figures/Pathway.png)\
Figure 2

The global artemisinin market is experiencing significant growth, with a 19.1% compound annual growth rate (CAGR) projected until 2031. The market size was USD 64 million in 2021 and is expected to reach USD 367.3 million in 2031. This growth is driven by several factors, including increased access to artemisinin-based combination therapies (ACTs) in malaria-endemic regions and the development of novel antimalarial medications [1].

Malaria remains a significant health concern in endemic regions, and climate change is expected to exacerbate the issue [1]. Despite progress in reducing malaria deaths, the number of cases has increased, further emphasising the importance of effective anti-malarial treatments [2]. The market is categorised by the type of artemisinin, either extracted from _A. annua_ or by semi-synthetic production. The market is still in its early stages due to a limited number of global manufacturers [1]. However, potential for growth and the global demand for effective anti-malarial treatments can attract new players to the market. 

Challenges affecting market growth include side effects of antimalarial drugs, the prevalence of counterfeit and substandard drugs, supply chain disruptions, programmatic uncertainties, and a demand-supply mismatch. Despite these challenges, the market's growth potential is encouraged by increasing demand from malaria-endemic nations, along with increased R&D activities, improved medical infrastructure, and government initiatives for innovative anti-malarial medications [1]. Overall, the global artemisinin market is poised for substantial growth, driven by the pressing need for effective malaria treatment, especially in regions where malaria remains a significant public health issue.


### 1.2 Literature review of the cell factory (<500 words)
_Bacillus subtilis_ is a Gram-positive bacterium widely used for industrial production of both chemicals and proteins.[MR1]  With multiple _B. subtilis_ processes generally recognized as safe (GRAS), it is a useful production host for many nutritional supplements and food additives [MR2]. Furthermore, with its lack of production of toxins [MR3], it is a good candidate for production of pharmaceuticals. Additionally, the bacterium is a fast grower, can grow from cheap substrates and is known for its good secretion capabilities [MR1]. All of these properties make _B. subtilis_ a suitable cell factory used for larger scale production.
Recent advances in synthetic biology have led to the development of new tools for genetic engineering of _B. subtilis_ [MR2]. These new tools can facilitate an easier integration of genes from the biosynthetic pathway of artemisinin and the subsequent optimization of expression likely required to achieve a high production. These advances make _B. subtilis_ a better choice as a production host compared to, for example, the commonly used cell factory _Escherichia coli_. _E. coli_ is capable of producing endotoxins in certain environments, depriving a lot of _E. coli_ processes of GRAS-status [MR4].

Semisynthetic artemisinin production has already been developed in the fungi _Saccharomyces cerevisiae_ [MR6], however, this host has a slower growth rate than _B. subtilis_. Furthermore, an earlier precursor for artemisinin, amorphadiene, has been successfully produced in _B. subtilis_ at much higher yields than obtained previously in _S. cerevisiae_ [MR5]. Thus _B. subtilis_ composes a more attractive production host than _S. cerevisiae_. 

However, there are drawbacks to using _B. subtilis_. The genes required for artemisinin production come from a plant, which may cause _B. subtilis_ to struggle with functional production of these heterologous enzymes. Protein folding issues could arise in this new host as folding is not compartmentalised in prokaryotic systems. Furthermore, genes might not be expressed in feasible amounts [MR5]. These challenges need to be taken into account when utilising this host for semisynthetic artemisinin production.

## 2. Problem definition (<300 words)
The critical anti-malarial drug, artemisinin, is derived from the sweet wormwood plant, where the precursor dihydroartemisinic acid (DHAA) is converted to artemisinin by UV light and oxygen (see Figure 2). However, production from plants is too time-consuming to meet global demand. To overcome this problem, engineered cell factories offer a promising solution, enabling rapid, scalable, and more environmentally friendly production of plant-products like artemisinin. Recent studies have shown successful production of the early artemisinin precursor, amorphadiene, using CRISPR-Cas9 in _B. subtilis_, making this a promising cell factory [1,2].

This project focuses on engineering _B. subtilis_ as a cell factory to produce the late artemisinin precursor, DHAA. This is achieved by introducing the relevant genes into an existing genome scale metabolic (GSM) model generating a heterologous pathway. This is done utilising the following methods: computational modelling and simulations to design and optimise the cell factory, considering theoretical maximum yields, up- and downregulation targets, phenotypic phase planes and dynamic flux balance analysis. Moreover, we will compute gene knockout strategies to identify genes that can be knocked out to improve DHAA production, and investigate if a co-factor swap can increase production. The combination of these approaches aims to create an efficient and sustainable platform for DHAA production, for later conversion into artemisinin, addressing a crucial need in global healthcare.

## 3. Selection and assessment of existing GSM model (<500 words)

Currently there are 10 GSM models for _B. Subtilis_ (strain 168), all based on one of the two earliest published GSM models by Oh et al. 2007 or Henry et al. 2009. An overview of the GSM models for _B. subtilis_ is gathered in the table below.

| GSM models         | Genes | Reactions | Metabolites | Refs.                  | Description                                                                                                                                                                                                       |
|--------------|-------|-----------|-------------|------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Center model | 534   | 563       | 456         | Goelzer et al. (2008)  | Manually curated metabolic, genetic, and regulatory networks of central metabolism using published data and expert knowledge.                                                                                     |
| iYO844       | 844   | 1021      | 988         | Oh et al. (2007)       | An in silico (i) generated model based on genomic, biochemical, and physiological information and high-throughput phenotyping experiments.                                                                        |
| iBsu1103     | 1103  | 1437      | 1139        | Henry et al. (2009)    | An in silico model based on SEED annotations  and fitting of estimated thermodynamic data with experimental data combined with a directionality prediction method.                                                |
| iBsu1103v2   | 1103  | 1451      | 1156        | Tanaka et al. (2013)   | Systematic mapping of non-essential regions by deletion mutations fitted to iBsu1103 predict interval deletion outcomes led to improvement of the accuracy of iBsu1103.                                           |
| iBsu1147     | 1147  | 1742      | 1456        | Hao et al. (2013)      | Model constructed from genomic and bibliomic data, the model iBsu1103 and subsequent modifications made in accordance with simulations related to biomass and ATP synthesis.                                      |
| iBsu1144     | 1144  | 1955      | 1103        | Kocabaş et al. (2017)  | The model was constructed from thermodynamic analyses and elimination of unconnected-reactions in the renewed B. subtilis reaction network, BsRN-2016.                                                            |
| eciYO844     | 844   | 1021      | 988         | Massaiu et al. (2019)  | An enzyme-constrained (ec) model made by integrating enzyme restrictions in iYO844 based on publicly available proteomics and enzyme kinetic parameters for central carbon metabolic reactions, the GECKO method. |
| etiBsu1209   | 1209  | 1948      | 1595        | Bi et al. (2023)       | Updated version of iBsu1147 utilising machine learning tools to fill the gaps and additional integration of enzymatic constraints (e), thermodynamic constraints (t), and transcriptional regulatory networks.    |
| ecBSU1       | 1155  | 3307      | 1459        | Wu et al. (2023)       | Updated version of iBsu1147 through gene-protein-reaction updates, biomass reaction standardization etc. and subsequent appliance of enzymatic constrains.                                                        |
| iBB1018      | 1018  | 1577      | 1291        | Blázquez et al. (2023) | Constructed on the basis on iBsu1103v2 subjected to manual curation with updated biochemical and physiological knowledge and manual gap-filling analysis.                                                         | 

Table 1

Despite the many GSM models presented, only models iYO844, iBsu1103, iBsu1147 and iBB1018 were publicly available and could be used for further Memote analysis to assess the quality of the models. The key results from Memote are summarised in Table 2 below.


| Organism | Total score [%] | Stoichiometric Consistency [%] | Mass Balance [%] | Metabolites Connectivity [%] |
|:--------:|:---------------:|:------------------------------:|:----------------:|:----------------------------:|
|  iYO844  |        86       |               100              |       94.4       |              100             |
| iBsu1103 |        34       |               100              |        0.0       |             99.6             |
| iBsu1147 |        32       |               0.0              |       97.2       |             99.0             |
|  iBB1018 |        71       |               0.0              |       92.4       |              100             |

Table 2

Of the four, only iYO844 is both mass balanced and is stoichiometric consistent in its reactions, which is necessary for our further implementation and analysis. Therefore iYO844 will be used as a GSM model to implement and optimise the DHAA production. The full results of the Memote analysis are saved as html files within the Memote folder.


## 4. Computer-Aided Cell Factory Engineering (<1500 words if Category II project)

### Incorporation of Heterologous Genes in the iYO844 Model

For the production of the artemisinin precursor DHAA in a heterologous host, it is necessary to find an metabolite which can be used as a starting point for the incorporation of a heterologous pathway.

_B. subtilis_ is capable of producing the metabolite FPP exclusively as a part of the 2-C-Methyl-D-erythritol- 4-phosphate (MEP) pathway. This was verified for the GSM model iYO844. To increase the supply of the starting point of our pathway, FPP, in _B. Subtilis_, an extra reaction which converts geranyl diphosphate (GPP) to FPP is added [Pramastya et al.]. The enzyme responsible for the reaction, farnesyl pyrophosphate synthase (FPPS), comes from _S. cerevisiae_. 

For production of DHAA in _B. subtilis_, the following enzymes should be incorporated into the GSM model: amorphadiene synthase (ADS), amorphadiene oxidase (CYP71AV1), alcohol dehydrogenase in combination with amorphadiene oxidase (ADH1_CYP71AV), artemisinic aldehyde double-bond reductase (DBR2), and aldehyde dehydrogenase 1 in combination with amorphadiene oxidase, which is used in two different reactions in our model (ALDH1_CYP71AV1 and ALDH1_CYP71AV1_2) (see Figure 2). The ALDH1_CYP71AV1 enzyme responsible for converting dihydroartemisinic aldehyde into DHAA also converts artemisinic aldehyde into artemisinic acid within _A. Annua_. To account for the loss of flux to DHAA, this reaction also needs to be incorporated into the GSM model with the annotation ALDH1_CYP71AV1_2. Enzymes responsible for the reactions from amorphadiene to DHAA come from _A. annua_, while the ADS enzyme responsible for converting FPP into amorphadiene comes from _S. Cerevisiae_. 

After the implementation of the seven reactions in the GSM model, the production of DHAA was tested. The results showed a successful production of DHAA within the modified GSM model. Furthermore, the modified model’s quality was assessed through a Memote analysis, and showed no significant difference to the original model. The modified GSM model  is saved as iYO844_modified.xml in the data folder, and the full analysis can be seen in ​​*[1_Incorporation_of_heterologous_genes.ipynb](/1_Incorporation_of_heterologous_genes.ipynb)
.

### Maximum Theoretical Yield
To optimise the model for higher production of DHAA, different suitable carbon sources were tested to investigate which would result in the highest maximum theoretical yield. The model is by default run with glucose as the carbon source, yielding a maximum theoretical yield of 0.214 mmol DHAA per mmol glucose (converted to 0.53 Cmol DHAA per Cmol glucose). 

Alternative carbon sources in our model were investigated and their ability to produce DHAA was assessed. Only carbon sources resulting in a growth rate greater than zero was investigated further, and the production rate of DHAA from these was provided in mmol DHAA gDW $^{-1}$ h $^{-1}$. From this, the alternative carbon sources resulting in the top 20 highest production rates of DHAA were looked into (see Table X). Especially maltotriose, maltose, and sucrose looked promising. These three alternative carbon sources were analysed due to the high growth rates and DHAA production rates they confer, and their ready availability to be used in fermentation over other carbon sources. The maximum theoretical yields of DHAA obtained from these carbon sources were compared to that of glucose. 

The yields in mmol/mmol obtained from the three alternative carbon sources were all greater than that obtained from glucose (see Figure XX). Maltotriose yielded 0.76 mmol DHAA per mmol maltotriose, and maltose and sucrose both yielded 0.48 mmol DHAA per mmol carbon source. However, when converting to Cmol/Cmol, it could be observed that the maximum theoretical yields obtained from maltotriose (0.63 Cmol DHAA per Cmol maltotriose), maltose (0.61 Cmol DHAA per Cmol maltose), and sucrose (0.61 Cmol DHAA per Cmol sucrose) were not significantly higher than that obtained from glucose (see Figure XXX). Furthermore, increasing the boundary for the different carbon sources did not yield a significant change in the production rate of DHAA. Therefore, it was decided to proceed with glucose as a carbon source for DHAA production. The full analysis can be seen in ​​[2_Maximum_theoretical_yield.ipynb](/2_Maximum_theoretical_yield.ipynb).

![Max yield mmol](figures/Maximum_Yield_from_Different_Carbon_Sources_mol.png)
![Max yield cmol](figures/Maximum_Yield_from_Different_Carbon_Sources_cmol.png)\
Figure 

## 5. Discussion (<500 words)

## 6. Conclusion (<200 words)

## References