## Python wrangling of EV data using pandas

In [12]:
#first we import pandas library  associated with data wrangling using python
import pandas as pd

In [13]:
#manually inputting the given data in the question
data = {
    'Date': ['2024-01-01', '2024-01-02', '2024-01-12'],
    'Devices': [['abcdef', 'xyzzyx', 'ababcd'], ['opqrst', 'rd77dr'], ['xvynma']],
    'Issues': [['emergency_stop', 'manual', 'remote_stop'], ['remote_stop', 'emergency'], ['remote_stop']],
    'Organisation': ['EV_one', 'EV_two', 'EV_three'],
    'Pending': ['Yes', 'No', 'Yes']
}

>>Lists of column data are used as keys to represent column names and values in a dictionary data set.
Lists of values in the Devices and Issues columns show that each row may contain more than one device and issue.

In [14]:
df = pd.DataFrame(data)
print(df)

         Date                   Devices  \
0  2024-01-01  [abcdef, xyzzyx, ababcd]   
1  2024-01-02          [opqrst, rd77dr]   
2  2024-01-12                  [xvynma]   

                                  Issues Organisation Pending  
0  [emergency_stop, manual, remote_stop]       EV_one     Yes  
1               [remote_stop, emergency]       EV_two      No  
2                          [remote_stop]     EV_three     Yes  


In [15]:
df_normalized = df.explode(['Devices', 'Issues'])

>>For the columns Devices and Issues in the DataFrame df, the *explode* function is called.
The *explode* normalises the DataFrame by converting each element of a list-like column to a single row.

In [16]:
print(df_normalized)

         Date Devices          Issues Organisation Pending
0  2024-01-01  abcdef  emergency_stop       EV_one     Yes
0  2024-01-01  xyzzyx          manual       EV_one     Yes
0  2024-01-01  ababcd     remote_stop       EV_one     Yes
1  2024-01-02  opqrst     remote_stop       EV_two      No
1  2024-01-02  rd77dr       emergency       EV_two      No
2  2024-01-12  xvynma     remote_stop     EV_three     Yes


#### Important Points:
>>Explode is used to convert columns containing list-like elements such that each element in the list has its own row during the normalisation process.

>>This aids in preparing the data for additional processing or analysis.

>>Data Duplication: Following an explosion, rows containing non-list-like columns (such as Date, Organisation, and Pending) will have duplicate entries.

In [19]:
#making csv copy of the result
df_normalized.to_csv('normalized_data.csv', index=False)