-
Notifications
You must be signed in to change notification settings - Fork 0
DRAFT: Pull request do issue 16 do shared resources #7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
comparação entre professores ativos e não ativos feita. (professores ativos terão o STATUS : ACTIVE enquanto os não ativos terão o STATUS: DEACTIVE)
codigo ainda incompleto em algumas partes
preparando para a pr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request implements data comparison and normalization changes to support shared resource updates, integrating information from both database scripts and an updated sheet. Key changes include:
- Refactoring discipline normalization into a new class structure.
- Orchestrating JSON generation, comparison, and upload via a consolidated main.py.
- Adjusting file paths and parameter values (e.g., year/semester) to align with the new sheet for offers and teachers.
Reviewed Changes
Copilot reviewed 36 out of 49 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| processamento_planilha/normalize.py | Introduces a NormalizeDisciplinas class to encapsulate discipline normalization with an updated file path. |
| processamento_planilha/main.py | Adds a main orchestrator that sequentially calls JSON generation, comparison, normalization, and upload. |
| processamento_planilha/load_sheet_to_mongodb.py | Updates file path and adjusts static year/semester values for sheet data. |
| processamento_planilha/get_old_collections_from_mongo.py | Extracts collections from MongoDB with minor formatting improvements. |
| processamento_planilha/extract_collections.py | Provides comprehensive data extraction and JSON generation from an Excel sheet. |
| processamento_planilha/comparando_jsons.py | Implements the ComparisonModule for merging JSON data with deactivation logic. |
| extract_collections.py | Removes duplicate extraction logic to consolidate functionality. |
Files not reviewed (13)
- .env.example: Language not supported
- Dockerfile: Language not supported
- new_collection/CAMPUS.json: Language not supported
- new_collection/PERIODOS.json: Language not supported
- new_collection/SALAS.json: Language not supported
- processamento_planilha/comparedjsonfiles/CAMPUS.json: Language not supported
- processamento_planilha/comparedjsonfiles/CURSOS.json: Language not supported
- processamento_planilha/comparedjsonfiles/PERIODOS.json: Language not supported
- processamento_planilha/jsonfiles/CAMPUS.json: Language not supported
- processamento_planilha/jsonfiles/CURSOS.json: Language not supported
- processamento_planilha/jsonfiles/PERIODOS.json: Language not supported
- processamento_planilha/normalize.py~: Language not supported
- processamento_planilha/oldjsonfiles/CAMPUS.json: Language not supported
| class NormalizeDisciplinas: | ||
| @staticmethod | ||
| def normalize_disciplinas(): |
Copilot
AI
May 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Consider adding a docstring for the NormalizeDisciplinas class and its normalize_disciplinas method to clearly describe its purpose and expected behavior.
| class NormalizeDisciplinas: | |
| @staticmethod | |
| def normalize_disciplinas(): | |
| class NormalizeDisciplinas: | |
| """ | |
| A class responsible for normalizing and updating the JSON file containing discipline data. | |
| It processes the data by grouping documents with the same `_id` and consolidating their | |
| `COD_CURS` values into a unique list. | |
| """ | |
| @staticmethod | |
| def normalize_disciplinas(): | |
| """ | |
| Normalizes the discipline data in the JSON file. | |
| This method performs the following steps: | |
| 1. Loads the discipline data from the JSON file. | |
| 2. Groups documents by `_id` and consolidates their `COD_CURS` values. | |
| 3. Saves the updated data back to the JSON file. | |
| """ |
|
|
||
| # Load the Excel file | ||
| file_path = 'sheet.xlsx' # Replace with your actual file path | ||
| file_path = '../sheet.xlsx' # Replace with your actual file path |
Copilot
AI
May 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Using a relative path here may lead to issues if the script is executed from an unexpected directory; consider using a configurable base directory or an absolute path to ensure reliable file access.
| file_path = '../sheet.xlsx' # Replace with your actual file path | |
| file_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), '../sheet.xlsx') # Construct absolute path |
| for offer in newjson: | ||
| offer['ANO'] = 2025 | ||
| offer['SEMESTRE'] = 1 | ||
|
|
||
| resultjson = newjson + oldjson | ||
|
|
Copilot
AI
May 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The merging of new and old offer JSON data in GenerateComparedOffers could result in duplicate entries if an offer exists in both; consider implementing a de-duplication step before writing the result.
| for offer in newjson: | |
| offer['ANO'] = 2025 | |
| offer['SEMESTRE'] = 1 | |
| resultjson = newjson + oldjson | |
| # Update new offers with the specified year and semester | |
| for offer in newjson: | |
| offer['ANO'] = 2025 | |
| offer['SEMESTRE'] = 1 | |
| # Use a dictionary to de-duplicate offers based on a unique key | |
| offers_dict = {} | |
| for offer in newjson + oldjson: | |
| unique_key = f"{offer['ID']}" # Replace 'ID' with the actual unique field(s) | |
| offers_dict[unique_key] = offer | |
| # Convert the dictionary values back to a list | |
| resultjson = list(offers_dict.values()) | |
| # Write the de-duplicated result to the output file |
Foram implementadas as seguintes mudanças:
Foi feita a comparação entre as informações puxadas da database scripts e as informações obtidas pelo sheet 2025.1 disponibilizado. A partir dessa comparação é atribuido um novo parametro booleano na collections "teacher" denominado "ACTIVE" que informa se o professor está ativo (true) ou não ativo (false) (informa false caso um professor que estava previamente na database não esta presente no novo sheet 2025.1)
Adicionado novos parametros na collection "offers" que determina se a oferta ja estava na database (atribuindo as tags ano: 2024 e semestre 2) ou caso ela veio do novo sheet 2025.1 (atribuindo as tags ano: 2025 e semestre: 1)
Codigo foi compactado de forma que possa ser executado apartir de um só arquivo (main.py)
OBS: A database que é utilizada para comparação com o sheet e que depois é atualizada é definida no .env (pode ser "scripts" ou "shared-resources")
-Logica de execução: