# CSV to PDF pipeline
This Notebook aims to help you modify the json output to your liking if none of the already available outputs suits you.  
This notebook suppose that you have the folowing dependencies installed :
- Pyhton 3  

And the following library : 
- pandas  

If you haven't installed it  you can uncomment the following code : 

In [None]:
#pip install pandas

The original pipeline depend on a config element configurated by the user options. 
This options are :
- large json
- inline downgades
- reduction category
- goe

In order to see what are the default outputs of the json and what each option does, you can read the Json Outpout section of the Readme.md

## Roadmap
- [Initial setup](#initial-setup)
- [Options](#options)
    - [Reduction category](#reduction-category)
    - [Downgrades value](#downgrades-value)
    - [Large output](#large-output)
- [Creation of elements entry](#creation-of-elements-entries)
- [Pipeline](#pipeline)
- [Write the json file](#write-the-json-file)

- [Playground](#playground) : enter the path of your csv file, choose your config and experiment with the functions.

# Initial setup

In [None]:
import pandas as pd
import json
import logging

And a variable `GOE`

In [None]:
GOE= ['-5', '-4', '-3', '-2', '-1', 'BASE', '1', '2', '3', '4', '5']

The config class that will be used by the pipeline. Most of the functions takes a config class as argument.

In [None]:
class Config:
    def __init__(self, largeOutput=False, inline_downgrades=False, reductionCategory=False, goe=False):
        self.largeOutput = largeOutput
        self.inline_downgrades = inline_downgrades
        self.reductionCategory = reductionCategory
        self.goe = goe
    
    @classmethod
    def synchro_skate_calc(cls):
        return cls(False,False,True,False)
    
    def inline_dg(self,value:bool):
        self.inline_downgrades=value


# Options
## Reduction Category
This sections is devided in two parts :
- Verifying this can actually be made (are the category actually equal)
- If this can be made, create a entry in the json for this category.

### Are the category equal

The first functions searches for category which have different elements under the same category name.

In [None]:
def FindCategoryofElements(df,categorylist): #=df["Category"].unique()
    List=[]
    for category in categorylist:
        Cat=df.query(f"Category == '{category}'")["Element"].unique()
        if len(Cat)>1:
            List.append({"Category":category,"Elements":Cat})
    return List

The goal here is to determine whether all elements under a category behave the same.
For example, if all Artistic Elements have identical GOE and base-value rows, then they can be merged into a single JSON section.

- `FindCategoryofElements()` builds a list of :
`{"Category": "ARTISTIC ELEMENTS", "Elements": ["AC", "AB", "AW", "AL"]}`

- `CategoryEqual()` checks whether these elements all share the same GOE table. `ListElem` is assumed to be the `"Elements"` list of a dictionnary in the list returned by `FindCategoryofElements()`

If they do, the category can be reduced into a single JSON entry.

In [None]:
def CategoryEqual(df,ListElem):
    temp=df.query(f"ElmtName == '{ListElem[0]}'")[GOE]
    for i in range(1,len(ListElem)):
        elem=df.query(f"ElmtName == '{ListElem[i]}'")[GOE]
        if not elem.reset_index(drop=True).equals(temp.reset_index(drop=True)): # if the Two dataframes are different
            return False
        else:
            temp=elem
    return True

### Creating category entries
The following function takes a dictElement which is the dictionnairy that will turn into a json in the end.


The implementation of category names is build on the element name (`Element` column of the CSV), which is assumed to have the following pattern :  
**[category Symbol] + [Element Symbol]**  

Here the element symbol is assumed to be a single letter such as :
- `AB` is a variation of `B`, assumed category symbol : `A`
- `CrI` is a variation of `I`, assumed category symbol : `Cr`

But for some categories, such as *LINEAR AND ROTATING ELEMENTS*, the  are only composed of one Letter. So instead the category name will be used.

In [None]:
def reductionCategory(df,dictElement,config:Config):
    cat=FindCategoryofElements(df,df["Category"].unique())
   
    if  not config.inline_downgrades:
        query="DGrade == 0"
        tempdf = df.query(query)
    else:
        query=""
        tempdf=df
    for i in range(len(cat)):
        equal=CategoryEqual(tempdf,cat[i]["Elements"])
        if equal:
            
            # Define Element name in the json
            name=cat[i]["Elements"][0][:-1] if len(cat[i]["Elements"][0])>1 else cat[i]["Category"]

            #Create name in the dictionnary
            dictElement[name]={}
            fillElement(tempdf.query(f"Element == '{cat[i]["Elements"][0]}'"),config,dictElement[name])
            
            if len(query)>0:
                query+= f" and Category != '{cat[i]['Category']}'"
            else :
                query+= f"Category != '{cat[i]['Category']}'"
    return query

## Downgrades value
This is one of the the default features. This option adds the Donwgrades as an element element.  

- This part assumes that only two donwgrades exists : `<` and `<<`
- The downgrades also assumed to be a fixed value and independent of the element and level of the element
- Since the presence of downgrades is recent, no downgrades will be added to the json if none are found.

The first function is to found downgrades values.

In [None]:
def findDGval(df):
    try :
        rowdg1=df.query("DGrade == 1").iloc[0]
        rowdg2=df.query("DGrade == 2").iloc[0]
        row1=df.query(f"Element == '{rowdg1["Element"]}' and ElmntLvl == '{rowdg1["ElmntLvl"]}' and DGrade == 0")
        row2=df.query(f"Element == '{rowdg2["Element"]}' and ElmntLvl == '{rowdg2["ElmntLvl"]}' and DGrade == 0")
        return ((row1["BASE"]-rowdg1["BASE"]).iloc[0],(row2["BASE"]-rowdg2["BASE"]).iloc[0])
    except Exception as e:
        logger.warning(f"find Downgrade Value Failed : {e}")
        return None

The second is to verify, all downgrades values applied are equal. Which means if we put  a downgrade on a Level 4 element the value of the downgrade applied will be the same as if it was Level 1. (Independence assumtumption)  

You may not use this function in your pipeline if you are sure all downgrades are the same.

In [None]:
def DowngradesValueEqual(df,dg): #=findDGval(df)
    dgdf=df.query("DGrade != 0")
    for tup in dgdf.itertuples():
        temp=(df.query(f"Element == '{tup.Element}' and ElmntLvl == '{tup.ElmntLvl}' and DGrade == 0")["BASE"]-tup.BASE).iloc[0]
        if not dg[tup.DGrade-1]==temp:
            return False
    return True

## Large output

In [None]:
def LargeJson(df):
    element={}
    for row in df.itertuples():
        if pd.isna(row.AFNot) or row.AFNot =="-":
            element[row.ElmtNot]={"base":row.BASE,"goe":dict(zip(GOE,row[7:18]))}
        else :
            element[row.ElmtNot+"+"+row.AFNot]={"base":row.BASE,"goe":dict(zip(GOE,row[7:18]))}
    return element

The output of this function can then directly be converted into a json.

# Creation of Elements entries

This part handle the ouput of each element in the json, for each level. The default value attrubuterd is the base value, this can be changed thought the goe option.

The role  of the first function si to determin what will be at the end of the json : 
```json
{"element" : 
    {"level": /* <BASE Value> or this dictionnary : {"base": <BASE> , "goe":{ <goe columns> }}*/
    }
}
```
You can modify it by making it return anything. The tupple attributes are the same as the csv columns exept for the goe that have columns name as number so they have to be called with `[]`.   

The Goe correspond to the following indexes :
-  `'-5'`, `'-4'`, `'-3'`, `'-2'`, `'-1'` correspond to the index 7 to 11
- `'1'`, `'2'`, `'3'`, `'4'`, `'5'` to the indexes from 13 to 17

In [None]:
def outputValue(tup,config:Config):
    if config.goe:
        return {"base":tup.BASE,"goe":dict(zip(GOE,tup[7:18]))}
    else :
        return tup.BASE

In [None]:
def DowngradeKey(DowngradeValue):
    return "NoDg" if DowngradeValue==0 else DowngradeValue*"<"

The element is the dictionary of the dictionnary associated to the element. More informations may be added in the [pipeline](#pipeline) or directly next to the levels though `outputValue()` or any other manner.

In [None]:
def fillElement(elementGroup,config:Config,element):
    
    #Without Additional Feature
    if elementGroup["AFNot"].isna().all():
        if not config.inline_downgrades:
            for Lvl in elementGroup.itertuples():
                element[Lvl.ElmntLvl]=outputValue(Lvl,config)
        else:
            for LvlElem,GroupLvl in elementGroup.groupby("ElmntLvl"):
                element[LvlElem]={}
                for Dg in GroupLvl.itertuples():
                    key=DowngradeKey(Dg.DGrade)
                    element[LvlElem][key]=outputValue(Dg,config)
   #with additional Feature                 
    else:
        if not config.inline_downgrades:
            for LvlElem,GroupLvl in elementGroup.groupby('ElmntLvl'):
                element[LvlElem]={}
                for AddF in GroupLvl.itertuples():
                    element[LvlElem][AddF.AFNot]=outputValue(AddF,config)
        else:
            for LvlElem,GroupLvl in elementGroup.groupby("ElmntLvl"):
                element[LvlElem]={}
                for AddF,GroupAF in GroupLvl.groupby("AFNot"):
                    element[LvlElem][AddF]={}
                    for Dg in GroupAF.itertuples():
                        key=DowngradeKey(Dg.DGrade)
                        element[LvlElem][AddF][key]=outputValue(Dg,config)

# Pipeline
- The following function creates the dictionary that will then be transformed into a json file.

- When searching for downgrades, if downgrades aren't equals with in each others, inline downgrade option will be applied even if it wasn't selected

- You can change the keys of the elements, add information for each elements...

In [None]:
def returnDict(df,config:Config):
    if config.largeOutput:
        
        return LargeJson(df)
    
    dictElement={}
    dg=findDGval(df)
    query=""
    if dg!=None :
        if not DowngradesValueEqual(df,dg) :
            logging.info("Downgrades values aren't equal")
            if config.inline_dg == False :
                config.inline_dg(True)
                logging.warning("Inline downgrade option will be applied")

        
        if not config.inline_downgrades:
            dictElement["Downgrades"]=dict(zip(["<","<<"],dg))
            query="DGrade == 0"
            logging.info("Downgrades Separated")

    if config.reductionCategory:
        query=reductionCategory(df,dictElement,config)
        logging.info("Category reducted")
    
    iterDf= df.query(query) if query else df
    
    for elem,group in iterDf.groupby('Element'):
        dictElement[elem]={}
        fillElement(group,config,dictElement[elem])
    logging.info("Json structure finished")
    return dictElement

# Write the json file

In [None]:
def returnJsonFile(input,config:Config,output):
    df=pd.read_csv(input)
    JsonDict=returnDict(df,config)
    with open(output,"w") as f :
        json.dump(JsonDict,f,indent=4)
    logging.info(f"Json genrerated in : {output}")

#returnJsonFile(filename.csv,config,output.json)

# Playground
Here you can experiment. You just have to fill the following fields to then be able to use the functions  more conviniently 

In [None]:
input_file=#"you_csv_file.csv"
outptput_file#"your_output_json.json"
df=pd.read_csv(input_file)

Your `Config` object :

In [None]:
config=Config(largeOutput=False, inline_downgrades=False, reductionCategory=False, goe=False) 