# **Reading and Managing json data**

 This project analyzes the results of a promotional sales event for an online retail company. The dataset contains top-spending customers over a 5-day event. The goal is to identify the customer with the highest total purchase, who will receive a prize, and to help develop future marketing strategies.

- Imports the required libraries:

- pandas for data manipulation

- numpy for numerical operations

In [None]:
import pandas as pd
import numpy as np

- Reads a JSON file containing customer sales data into a pandas DataFrame called dados.

In [None]:
dados = pd.read_json('/content/dados_vendas_clientes.json')
dados

Unnamed: 0,dados_vendas
0,"{'Data de venda': '06/06/2022', 'Cliente': ['@..."
1,"{'Data de venda': '07/06/2022', 'Cliente': ['I..."
2,"{'Data de venda': '08/06/2022', 'Cliente': ['I..."
3,"{'Data de venda': '09/06/2022', 'Cliente': ['J..."
4,"{'Data de venda': '10/06/2022', 'Cliente': ['M..."


- Flattens the nested JSON structure (in the dados_vendas key) into a normalized tabular format.

In [None]:
dados = pd.json_normalize(dados['dados_vendas'])
dados

Unnamed: 0,Data de venda,Cliente,Valor da compra
0,06/06/2022,"[@ANA _LUCIA 321, DieGO ARMANDIU 210, DieGO AR...","[R$ 836,5, R$ 573,33, R$ 392,8, R$ 512,34]"
1,07/06/2022,"[Isabely JOanes 738, Isabely JOanes 738, Isabe...","[R$ 825,31, R$ 168,07, R$ 339,18, R$ 314,69]"
2,08/06/2022,"[Isabely JOanes 738, JOãO Gabriel 671, Julya m...","[R$ 682,05, R$ 386,34, R$ 622,65, R$ 630,79]"
3,09/06/2022,"[Julya meireles 914, MaRIA Julia 444, MaRIA Ju...","[R$ 390,3, R$ 759,16, R$ 334,47, R$ 678,78]"
4,10/06/2022,"[MaRIA Julia 444, PEDRO PASCO 812, Paulo castr...","[R$ 314,24, R$ 311,15, R$ 899,16, R$ 885,24]"


- Stores the list of column names from the DataFrame dados in the variable cols.

In [None]:
cols = list(dados.columns)
cols

['Data de venda', 'Cliente', 'Valor da compra']

- Uses explode() to expand list-like elements in all columns (except the first) into separate rows.
This is useful when columns contain arrays of values.

In [None]:
dados = dados.explode(cols[1:])
dados

Unnamed: 0,Data de venda,Cliente,Valor da compra
0,06/06/2022,@ANA _LUCIA 321,"R$ 836,5"
0,06/06/2022,DieGO ARMANDIU 210,"R$ 573,33"
0,06/06/2022,DieGO ARMANDIU 210,"R$ 392,8"
0,06/06/2022,DieGO ARMANDIU 210,"R$ 512,34"
1,07/06/2022,Isabely JOanes 738,"R$ 825,31"
1,07/06/2022,Isabely JOanes 738,"R$ 168,07"
1,07/06/2022,Isabely JOanes 738,"R$ 339,18"
1,07/06/2022,Isabely JOanes 738,"R$ 314,69"
2,08/06/2022,Isabely JOanes 738,"R$ 682,05"
2,08/06/2022,JOãO Gabriel 671,"R$ 386,34"


- Resets the row index after the explode operation, removing the old index and reordering the rows.

In [None]:
dados.reset_index(inplace = True, drop = True)
dados

Unnamed: 0,Data de venda,Cliente,Valor da compra
0,06/06/2022,@ANA _LUCIA 321,"R$ 836,5"
1,06/06/2022,DieGO ARMANDIU 210,"R$ 573,33"
2,06/06/2022,DieGO ARMANDIU 210,"R$ 392,8"
3,06/06/2022,DieGO ARMANDIU 210,"R$ 512,34"
4,07/06/2022,Isabely JOanes 738,"R$ 825,31"
5,07/06/2022,Isabely JOanes 738,"R$ 168,07"
6,07/06/2022,Isabely JOanes 738,"R$ 339,18"
7,07/06/2022,Isabely JOanes 738,"R$ 314,69"
8,08/06/2022,Isabely JOanes 738,"R$ 682,05"
9,08/06/2022,JOãO Gabriel 671,"R$ 386,34"


- Displays the structure of the DataFrame, including data types and non-null counts for each column.

In [None]:
dados.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20 entries, 0 to 19
Data columns (total 3 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   Data de venda    20 non-null     object
 1   Cliente          20 non-null     object
 2   Valor da compra  20 non-null     object
dtypes: object(3)
memory usage: 612.0+ bytes


Cleans the Valor da compra (purchase value) column by:

- Removing "R$" symbols

- Replacing commas with dots for decimal formatting

- Removing leading/trailing whitespace

In [None]:
dados['Valor da compra'] = dados['Valor da compra'].apply(lambda x: x.replace('R$', '').replace(',', '.').strip())

- Displays the DataFrame to inspect changes made in the previous cleaning step.

In [None]:
dados

Unnamed: 0,Data de venda,Cliente,Valor da compra
0,06/06/2022,@ANA _LUCIA 321,836.5
1,06/06/2022,DieGO ARMANDIU 210,573.33
2,06/06/2022,DieGO ARMANDIU 210,392.8
3,06/06/2022,DieGO ARMANDIU 210,512.34
4,07/06/2022,Isabely JOanes 738,825.31
5,07/06/2022,Isabely JOanes 738,168.07
6,07/06/2022,Isabely JOanes 738,339.18
7,07/06/2022,Isabely JOanes 738,314.69
8,08/06/2022,Isabely JOanes 738,682.05
9,08/06/2022,JOãO Gabriel 671,386.34


- Converts the Valor da compra column from string to numeric (float32) so it can be used in calculations.

In [None]:
dados['Valor da compra'] = dados['Valor da compra'].astype(np.float32)

- Re-checks the structure of the DataFrame to confirm the data type conversion for Valor da compra.

In [None]:
dados.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20 entries, 0 to 19
Data columns (total 3 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Data de venda    20 non-null     object 
 1   Cliente          20 non-null     object 
 2   Valor da compra  20 non-null     float32
dtypes: float32(1), object(2)
memory usage: 532.0+ bytes


Cleans the Cliente (customer name) column by:

- Converting all names to lowercase

- Removing all non-letter characters (e.g., accents, punctuation, numbers)

In [None]:
dados['Cliente'] = dados['Cliente'].str.lower().replace('[^a-z ]', ' ', regex = True)

In [None]:
dados

Unnamed: 0,Data de venda,Cliente,Valor da compra
0,06/06/2022,ana lucia,836.5
1,06/06/2022,diego armandiu,573.330017
2,06/06/2022,diego armandiu,392.799988
3,06/06/2022,diego armandiu,512.340027
4,07/06/2022,isabely joanes,825.309998
5,07/06/2022,isabely joanes,168.070007
6,07/06/2022,isabely joanes,339.179993
7,07/06/2022,isabely joanes,314.690002
8,08/06/2022,isabely joanes,682.049988
9,08/06/2022,joão gabriel,386.339996


- Converts the Data de venda (sale date) column to datetime format for proper time-based analysis.

In [None]:
dados['Data de venda'] = pd.to_datetime(dados['Data de venda'])

- Displays the DataFrame to verify the date conversion.

In [None]:
dados

Unnamed: 0,Data de venda,Cliente,Valor da compra
0,2022-06-06,ana lucia,836.5
1,2022-06-06,diego armandiu,573.330017
2,2022-06-06,diego armandiu,392.799988
3,2022-06-06,diego armandiu,512.340027
4,2022-07-06,isabely joanes,825.309998
5,2022-07-06,isabely joanes,168.070007
6,2022-07-06,isabely joanes,339.179993
7,2022-07-06,isabely joanes,314.690002
8,2022-08-06,isabely joanes,682.049988
9,2022-08-06,joão gabriel,386.339996


- Groups the data by customer name and calculates the total amount spent by each customer.
It then sorts the totals in descending order to find the top spender.

In [None]:
dados.groupby('Cliente')['Valor da compra'].sum().sort_values(ascending=False)

Unnamed: 0_level_0,Valor da compra
Cliente,Unnamed: 1_level_1
isabely joanes,2329.300049
maria julia,2086.649902
julya meireles,1643.73999
diego armandiu,1478.469971
paulo castro,899.159973
thiago fritzz,885.23999
ana lucia,836.5
joão gabriel,386.339996
pedro pasco,311.149994
