Skip to content

Conversation

@ItsRikan
Copy link
Contributor

@ItsRikan ItsRikan commented Jan 8, 2026

Changes

1) Dynamic Folder Reader :

Fetches all files from the folder, tries to fetch all data, and makes a dataset using all possible files and enhanced docstrings
**Effect: ** Useful for real-life data where we have data in various formats (e.g., JSON, Excel, CSV)
**Changes Made In: ** dskit/core.py | read_folder function and dskit/io.py read_folder function

2) Zero Division Check :

Checks added to avoid zero division error
**Effect: ** Robust and reliable code structure
**Changes Made In: ** dskit/cleaning.py | outlier_summary and remove_outliers

@Sreoshh
Copy link

Sreoshh commented Jan 18, 2026

Proposed Solution

Extend the existing read_folder functionality to support automatic detection and loading of mixed file formats in a directory.
Specifically:

Add a new optional parameter--- dynamic: bool = False
— when set to True, read_folder will: detect file format per file (CSV, JSON, Excel),load each file individually into a Pandas DataFrame,unify them into a single combined DataFrame.
I know this feature has already been implemented in a Pull Request on the DsKit repository with multiple commits improving the read_folder logic and supporting dynamic loading of multiple formats but I’m open to feedback and willing to refactor it

@Aksh-Agrawal Aksh-Agrawal merged commit 833989d into Programmers-Paradise:main Jan 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants