## <font color='#696969'> Example: Import Multiple File Extensions</font>

In this example, we demonstrate how to import datasets in various file formats, including `.csv`, `.xlsx`, `.json`, `.parquet`, and `.txt`. We’ll walk you through setting up the necessary paths and using the `SmartDataLoader` to streamline the import process. This example shows how to efficiently handle different data formats and load them into a consistent structure, making it easier for data practitioners to work with diverse data sources.

__Please note that this might look different when you run it in your own project. If you follow the recommended setup, you can skip all the directory configuration__ 🚀. 

I’ll be creating a video tutorial soon and will add the hyperlink here!

#### __Reccomended Setup for your project:__

```bash
your_project_folder/
├── data/                    # Folder for input data
├── SmartDataLoader/         # SmartDataLoader package
├── ...                      # Other files or directories in your project
├── your_script.py           # Python script (if you're using one)
└── your_notebook.ipynb      # Jupyter notebook (if you're using one)
```

In [1]:
# Import libraries
import os
import sys
import pandas as pd

# Add the utils directory to the sys path
sys.path.append(os.path.abspath(os.path.join('..', 'utils')))

# Import user defined modules
from path_manager import PathManager
from data_ingestor import DataFactory
from data_loader import DataLoader

## <font color='#696969'>Directory Setup</font>

For the examples, the root folder is the `SmartDataLoader` itself, so it needs to be configured. 

**By default, the Path Manager assumes that the `SmartDataLoader` folder is located within your project directory. This is the recommended setup, as it avoids the need for additional directory configuration.**

<br>

**⚠️ ATTENTION:** If your data folder is not named `data`, **ensure that you specify the correct folder name in the `data_folder` attribute.**

In [2]:
# Set the project root path by going one level up from the 'examples' folder
root_path = os.path.abspath(os.path.join(os.getcwd(), '..'))

# Set up the data path, in this example the data in the example_data folder within the clean_data
data_folder_name = os.path.join('example_data', 'clean_data')

# Set up the path manager instance
path_manager_instance = PathManager(root_path=root_path, data_folder=data_folder_name)

print(
    f'Root path: {path_manager_instance.get_project_root()}\n'
    f'Data path: {path_manager_instance.get_data_path()}\n'
    f'Metadata path: {path_manager_instance.get_metadata_path()}\n'
    f'Extracted Data path: {path_manager_instance.get_extracted_data_path()}'
)

Root path: /Users/franco/Desktop/SmartDataLoader
Data path: /Users/franco/Desktop/SmartDataLoader/example_data/clean_data
Metadata path: /Users/franco/Desktop/SmartDataLoader/metadata
Extracted Data path: /Users/franco/Desktop/SmartDataLoader/extracted_data


## <font color='#696969'>Ingest Data</font>

Now that the directories are set up, we can see the paths: the **metadata** and **extracted data** folders will be located within the project folder. **Remember**, the extracted data folder will only be created if there is a zip file present. 

We will now ingest the data. In this case, passing the pathes we created to create **metadata** thath import the files faster. If the

<br>

**⚠️ ATTENTION**: You only need to run the **ingestor** once. **Be aware** that if you update the metadata and then run the data ingestor again, your previous changes will be **deleted**.

In [3]:
ingestor = DataFactory(
    data_path = path_manager_instance.get_data_path(),
    metadata_path = path_manager_instance.get_metadata_path(),
    extracted_data_path = path_manager_instance.get_extracted_data_path()
)

ingestor.run_ingestion()

Metadata saved at /Users/franco/Desktop/SmartDataLoader/metadata/metadata.json
Data was successfully ingested.


## <font color='#696969'>Load Data</font>

It's hight time to load the data! __We just need to pass our the metadata path__.

Note: if 

In [5]:
data_loader_instance = DataLoader(path_manager_instance.get_metadata_path())

There is 1 file in the input data folder.



Since we have just one file in the input data folder, a variable named df 

In [None]:
df = data_loader_instance.load_data()