# 📦 Step 1: Install Required Packages

Before you start using this notebook, make sure all the necessary Python packages are installed. These packages are listed in the `requirements.txt` file.

## 💻 How to Install

Open your terminal or command prompt, navigate to the project directory (where `requirements.txt` is located), and run:

```bash
pip install -r requirements.txt
```

This command will automatically install all dependencies needed for the notebook and project to run smoothly.

> ✅ **Tip:** If you're using a virtual environment (recommended), make sure it's activated before running the command.

## 🛠️ Troubleshooting

* If you get a "permission denied" error, try adding `--user`:

  ```bash
  pip install --user -r requirements.txt
  ```

* If you're working in Jupyter and want to install directly from a notebook cell, you can run:

  ```python
  !pip install -r requirements.txt
  ```

# 🔧 Second Step: Create and Register Your Project Folder

In this step, you'll register the **main project folder**, so the system knows where to store and retrieve all files related to your project.

This setup automatically creates a structured folder system under your specified `main_folder`, and stores the configuration in a central JSON file. This ensures your projects remain organized, especially when working with multiple studies or datasets.

---

## 🗂️ Project Folder Structure

On the first run, the following folder structure will be automatically created under your specified `main_folder`:

```
main_folder/
├── database/
│   └── scopus/
│   └── <project_name>_database.xlsx  ← auto-generated path (not file creation)
├── pdf/
└── xml/
```

---

## 💡 Example

Suppose your project name is `corona_discharge`, and you want to store all project files under:

```
G:\My Drive\research_related\ear_eog
```

You can register this setup by running:

```python
project_folder(
    project_review='corona_discharge',
    main_folder=r'G:\My Drive\research_related\ear_eog'
)
```

✅ This will:

* Save the project path in `setting/project_folders.json`
* Create the full folder structure: `database/scopus`, `pdf`, and `xml`

---

## 🔁 Loading the Project Later

Once registered, you can access the project folders in future sessions by providing just the `project_review` name:

```python
paths = project_folder(project_review='corona_discharge')

print(paths['main_folder'])  # Main project folder
print(paths['csv_path'])     # Path to the Excel database file
```

---

## ⚙️ What Happens in the Background?

* A file named `project_folders.json` is stored in the `setting/` directory within your project.
* It maps each `project_review` to its corresponding `main_folder`.
* Folder structure is created automatically on the first run.
* On subsequent runs, the system reads the JSON to locate your project — no need to re-enter paths.


In [6]:
from setting.project_path import project_folder
project_name='corona_discharge'
main_folder=r"D:\my_project"

project_folder(project_review=project_name, main_folder=main_folder)

{'main_folder': 'D:\\my_project',
 'csv_path': 'D:\\my_project\\database\\corona_discharge_database.xlsx'}

# 📥 Third Step: Download the Scopus BibTeX File

In this step, you'll use the **Scopus database** to find and download relevant papers for your project. Scopus is a comprehensive and widely-used repository of peer-reviewed academic literature.

---

## 🔍 Using Scopus Advanced Search

To retrieve high-quality and relevant papers, we recommend using **Scopus' Advanced Search** feature. This powerful tool lets you refine your search based on:

* Keywords
* Authors
* Publication dates
* Document types
* And more...

This ensures that your literature collection is both targeted and comprehensive.

---

## 💡 Get Keyword Ideas with a Prompt

To help you formulate effective search queries, you can use the following **prompt-based suggestion tool**:

👉 [Keyword Search Prompt](https://gist.github.com/balandongiv/886437963d38252e61634ddc00b9d983)

You may need to modify the prompt to better suit your research domain. Here are some example domains:

* `"corona discharge"`
* `"fatigue driving EEG"`
* `"wafer classification"`

Feel free to add, remove, or tweak keywords as needed to refine your search results.

---

## 💾 Save and Organize Your Results

Once you've finalized your search:

1. **Select all available attributes** when exporting results from Scopus.
2. Choose the **BibTeX** format when saving the export file.
3. Save the file inside the `database/scopus/` folder of your project.

The resulting folder structure might look like this:

```
main_folder/
├── database/
│   └── scopus/
│       ├── scopus(1).bib
│       ├── scopus(2).bib
│       ├── scopus(3).bib
```

Make sure the BibTeX files are correctly named and stored to ensure smooth integration in later steps.

# 📊 Fourth Step: Combine Scopus BibTeX Files into Excel

Once you've downloaded multiple `.bib` files from Scopus, the next step is to **combine and convert** them into a structured Excel file. This makes it easier to filter, sort, and review the metadata of all collected papers.

---

## 🧰 What This Step Does

* Loads all `.bib` files from your project's `database/scopus/` folder
* Parses the relevant metadata (e.g., title, authors, year, source, DOI)
* Combines the results into a single Excel spreadsheet
* Saves the spreadsheet in the `database/` folder as `combined_filtered.xlsx`



## 📁 Folder Structure Example

After running the script, your folder might look like this:

```
main_folder/
├── database/
│   ├── scopus/
│   │   ├── scopus(1).bib
│   │   ├── scopus(2).bib
│   │   ├── scopus(3).bib
│   └── combined_filtered.xlsx
```

This Excel file will serve as your primary reference for filtering papers before downloading PDFs.


In [8]:
import os
from download_pdf.database_preparation import combine_scopus_bib_to_excel
from setting.project_path import project_folder

project_review='corona_discharge'
path_dic=project_folder(project_review=project_review)
main_folder = path_dic['main_folder']
folder_path=os.path.join(main_folder,'database','scopus')
output_excel =  os.path.join(main_folder,'database','combined_filtered.xlsx')
combine_scopus_bib_to_excel(folder_path, output_excel)

Found 3 .bib files in D:\my_project\database\scopus
Initial number of rows: 7
Number of rows after duplicate removal: 5. Total duplicates removed: 2
These are the unique publisher_long values: ['Nature Research' 'Springer Science and Business Media B.V.'
 'BioMed Central Ltd']
['nature' 'springer' 'biomedcentral']
Combined file has been saved to: D:\my_project\database\combined_filtered.xlsx


# Combine to master listvv


When use the agent_name="section_sorter", sometime, the section ID is not matched with the master json file.
This make it difficult to compile the childrent json(i.e., all the json from llm) into master file

This code use different strategy to match the section ID with the master json file.
`C:\Users\balan\IdeaProjects\academic_paper_maker\research_filter\section_sorter_to_json_masterfile.py`

# Excel to bib



In [None]:
from post_code_saviour.excel_to_bib import generate_bibtex
import pandas as pd
 # Load the Excel file
file_path = r"C:\Users\balan\IdeaProjects\academic_paper_maker\bib_example\combined_filtered.xlsx"
output_path=r"C:\Users\balan\IdeaProjects\academic_paper_maker\bib_example\xcombined_filtered.bib"

df = pd.read_excel(file_path)

# Generate BibTeX
generate_bibtex(df,output_file=output_path)
