In [2]:

# Your provided text string:
project_text = """
1. Title Page
    1.1 Project Title: CKD Prediction Using Machine Learning Techniques
    1.2 Your Name
    1.3 Affiliation (e.g., University Name, Department)
    1.4 Submission Date
    1.5 Course Name (if applicable)
    1.6 Supervisor/Instructor Name (if applicable)
2. Abstract (Approximately 1 page)
    2.1 Briefly introduce Chronic Kidney Disease and its significance.
    2.2 State the aim of the project (CKD prediction using ML).
    2.3 Briefly mention the machine learning techniques used.
    2.4 Summarize the key findings and results.
    2.5 Briefly highlight the potential impact of the work.
    2.6 Keywords: Chronic Kidney Disease, CKD, Machine Learning, Prediction, Classification, Healthcare
3. Table of Contents (Approximately 1 page)
    3.1 List all sections and subsections with corresponding page numbers.
    3.2 Ensure clear and accurate organization for easy navigation.
4. Introduction (Approximately 2 pages)
    4.1 Background of Chronic Kidney Disease (CKD)
        4.1.1 Definition and prevalence of CKD globally and potentially in a specific region (e.g., India).
        4.1.2 Stages of CKD and their implications.
        4.1.3 Risk factors associated with CKD (e.g., diabetes, hypertension).
        4.1.4 Impact of CKD on individuals and healthcare systems (economic burden, mortality).
    4.2 The Role of Machine Learning in Healthcare
        4.2.1 Introduction to machine learning and its applications in medical diagnosis and prediction.
        4.2.2 Advantages of using machine learning in healthcare (e.g., early detection, personalized medicine).
        4.2.3 Overview of common machine learning techniques relevant to prediction tasks.
    4.3 Motivation for CKD Prediction using ML
        4.3.1 Challenges in traditional CKD diagnosis.
        4.3.2 Potential benefits of early and accurate CKD prediction (e.g., timely intervention, improved patient outcomes).
        4.3.3 Justification for using machine learning to address these challenges.
    4.4 Project Objectives
        4.4.1 Clearly state the specific goals of your project (e.g., to develop and evaluate a machine learning model for CKD prediction).
        4.4.2 List measurable objectives (e.g., achieve a certain level of accuracy, compare different ML algorithms).
    4.5 Structure of the Report
        4.5.1 Briefly outline the organization of the subsequent chapters.
5. Literature Review (Approximately 3 pages)
    5.1 Existing Research on CKD Prediction
        - Review of previous studies on CKD prediction using statistical methods and traditional approaches.
    5.2 Application of Machine Learning in CKD Prediction
        - Detailed analysis of existing literature on the use of various machine learning algorithms (e.g., Logistic Regression, Support Vector Machines, Decision Trees, Random Forests, Neural Networks) for CKD prediction.
        - Discussion of the datasets used in previous studies, including relevant features.
        - Analysis of the performance metrics reported in prior research (e.g., accuracy, sensitivity, specificity).
    5.3 Feature Selection and Engineering in CKD Prediction
        - Examination of different feature selection techniques used in CKD-related studies.
        - Discussion of the importance of feature engineering for improving prediction accuracy.
    5.4 Gaps in the Existing Literature
        - Identify any limitations or areas that have not been fully explored in previous research.
        - Highlight how your project aims to contribute to the existing body of knowledge.
6. Problem Statement / Objectives (Approximately 1 page - can be integrated into the Introduction if page count is a concern)
    6.1 Problem Statement
        - Clearly and concisely define the problem your project aims to address (e.g., the need for a more accurate and efficient CKD prediction model).
    6.2 Research Questions
        - Formulate specific questions that your project seeks to answer (e.g., Which machine learning algorithm yields the highest accuracy for CKD prediction using a specific dataset?).
    6.3 Project Objectives (Revisited and Detailed)
        - Provide a numbered list of specific, measurable, achievable, relevant, and time-bound (SMART) objectives for your project.
7. Methodology (Approximately 3 pages)
    7.1 Dataset Description
        - Detailed information about the dataset used for your project (e.g., source, size, number of features, data types).
        - Explanation of the features relevant to CKD prediction (e.g., serum creatinine, blood urea nitrogen, albumin).
        - Discussion of any data preprocessing steps performed (e.g., handling missing values, data scaling, normalization).
    7.2 Machine Learning Algorithms
        - Detailed explanation of the machine learning algorithms selected for your project (e.g., Logistic Regression, Random Forest, Neural Network).
        - Theoretical background of each algorithm.
        - Justification for choosing these specific algorithms.
    7.3 Feature Selection/Engineering Techniques (if applicable)
        - Explanation of any feature selection methods used (e.g., correlation analysis, chi-squared test, Recursive Feature Elimination).
        - Description of any new features engineered (if applicable) and the rationale behind them.
    7.4 Model Development and Training
        - Steps involved in training the machine learning models.
        - Explanation of the training and validation split of the data.
        - Hyperparameter tuning techniques used (e.g., grid search, cross-validation).
    7.5 Evaluation Metrics
        - Detailed description of the evaluation metrics used to assess the performance of the models (e.g., accuracy, precision, recall, F1-score, AUC-ROC curve).
        - Explanation of why these metrics are appropriate for the CKD prediction task.
    7.6 Implementation Environment
        - Software and libraries used for implementation (e.g., Python, scikit-learn, TensorFlow, Keras).
        - Hardware specifications (if relevant).
8. Experiments / Implementation (Approximately 2 pages)
    8.1 Experimental Setup
        - Detailed description of how the experiments were conducted.
        - Step-by-step procedure for data preprocessing, model training, and evaluation.
        - Configuration details of the machine learning models (e.g., specific hyperparameter values used).
    8.2 Implementation Details
        - Code snippets or descriptions of key implementation steps (without including the entire codebase in the main body â€“ this can go in the Appendix).
        - Explanation of any challenges faced during implementation and how they were addressed.
9. Results and Analysis (Approximately 3 pages)
    9.1 Performance of Individual Models
        - Present the results of each trained machine learning model using the chosen evaluation metrics (e.g., tables, figures).
        - Provide a detailed analysis of the performance of each model.
    9.2 Comparison of Different Models
        - Compare the performance of the different machine learning models.
        - Identify the best-performing model based on the evaluation metrics.
        - Discuss the reasons for the observed differences in performance.
    9.3 Analysis of Feature Importance (if applicable)
        - If your chosen models allow for feature importance analysis (e.g., Random Forest), present and discuss the most important features for CKD prediction.
        - Relate the identified important features to known risk factors of CKD.
    9.4 Visualization of Results
        - Use appropriate visualizations (e.g., confusion matrices, ROC curves, bar charts) to illustrate the results and facilitate understanding.
10. Discussion (Approximately 2 pages)
    10.1 Interpretation of Findings
        - Discuss the significance of your findings in the context of the research questions and objectives.
        - Interpret the performance of the best-performing model and its implications for CKD prediction.
    10.2 Comparison with Existing Literature
        - Compare your results with the findings of previous studies discussed in the literature review.
        - Highlight any similarities or differences and provide possible explanations.
    10.3 Strengths and Limitations of the Study
        - Discuss the strengths of your methodology and findings.
        - Acknowledge any limitations of your study (e.g., dataset size, potential biases, generalizability).
    10.4 Ethical Considerations (if applicable)
        - Briefly discuss any ethical considerations related to using healthcare data and developing predictive models.
11. Conclusion and Future Work (Approximately 1 page)
    11.1 Conclusion
        - Summarize the key findings and contributions of your project.
        - Restate whether the project objectives were achieved.
        - Briefly highlight the potential impact of your work.
    11.2 Future Work
        - Suggest potential directions for future research based on your findings and limitations.
        - Ideas for improvement, such as using larger and more diverse datasets, exploring more advanced machine learning techniques (e.g., deep learning), incorporating temporal data, or developing a user-friendly prediction tool.
12. References (Variable length)
    - List all the academic sources (e.g., journal articles, conference papers, books, reports) cited in your project using a consistent citation style (e.g., APA, MLA, IEEE).
    - Ensure all in-text citations have corresponding entries in the references section.
13. Appendices (Variable length)
    - Include any supplementary material that is not essential for the main body of the report but might be helpful for the reader (e.g., detailed code snippets, additional tables or figures, raw data summaries, consent forms if applicable).
"""


In [1]:
project_text= """
    Subject: Science & Engineering
    Chapter: Biology/Botany
    Topic: Rate of Photosynthesis

    1.  Title Page
        1.1. Title of the Project
        1.2. Author(s)
        1.3. Affiliation
        1.4. Date of Submission

    2.  Abstract (1 page)
        2.1. Summary of the study
        2.2. Methods used
        2.3. Key findings
        2.4. Conclusion

    3.  Table of Contents (1 page)

    4.  Introduction (3 pages)
        4.1. Background of Photosynthesis
            4.1.1. Importance of Photosynthesis in Ecosystems
            4.1.2. Basic Principles of Photosynthesis
        4.2. Factors Affecting Photosynthesis
        4.3. Objectives of the Study
        4.4. Hypothesis

    5.  Literature Review (5 pages)
        5.1. Previous Studies on Photosynthetic Rates
            5.1.1. Research on Different Plant Species
            5.1.2. Impact of Environmental Conditions (Light, CO2, Temperature)
        5.2. Methodologies Used in Photosynthesis Research
        5.3. Gaps in Existing Research

    6.  Materials and Methods (4 pages)
        6.1. Plant Materials
            6.1.1. Selection of Plant Species
            6.1.2. Preparation of Samples
        6.2. Experimental Setup
            6.2.1. Controlled Environmental Conditions
            6.2.2. Measurement Tools (e.g., Photosynthesis Measurement System) 
        6.3. Experimental Design
            6.3.1. Variables (Independent and Dependent)
            6.3.2. Control Groups
        6.4. Procedure
            6.4.1. Step-by-step explanation
        6.5. Data Collection
        6.6. Statistical Analysis

    7.  Results (5 pages)
        7.1. Photosynthetic Rates Under Different Conditions
            7.1.1. Tables and Graphs Representing Data
        7.2. Statistical Analysis Results
            7.2.1. Significance Testing

    8.  Discussion (4 pages)
        8.1. Interpretation of Results
        8.2. Comparison with Existing Literature
        8.3. Limitations of the Study
        8.4. Sources of Error

    9.  Conclusion (2 pages)
        9.1. Summary of Findings
        9.2. Implications of the Study
        9.3. Future Research Directions

    10. References (2 pages)

    11. Appendices
        11.1. Raw Data
        11.2. Statistical Analysis Output
        11.3. Equipment Specifications

    """

In [33]:
import re
from collections import defaultdict

# Sample input (assume you have the full project_text variable as defined earlier)
project_text = project_text  # Replace with your actual text

# Function to create a nested defaultdict
def nested_dict():
    return defaultdict(nested_dict)

# Store the hierarchy
hierarchy = nested_dict()

# Split by lines and parse
for line in project_text.strip().split('\n'):
    line = line.strip()
    if not line:
        continue

    # Match the heading number and text
    match = re.match(r"^(\d+(?:\.\d+)*)(?:\s+)?(.*)", line)
    if match:
        keys = match.group(1).split('.')
        title = match.group(2).strip()
        
        # Traverse the nested dict using keys
        d = hierarchy
        for key in keys[:-1]:
            d = d[key]
        d[keys[-1]] = {"title": title, "subtopics": {}}
    else:
        # In case of continuation or subcontent, ignore or handle if needed
        continue

# Function to print the hierarchy in readable format
def print_structure(d, indent=0):
    for key in sorted(d.keys(), key=lambda x: (int(x) if x.isdigit() else x)):
        item = d[key]
        print('    ' * indent + f"{key}. {item['title']}")
        if item['subtopics']:
            print_structure(item['subtopics'], indent + 1)

# Convert structure into final readable form
def flatten_structure(d):
    flat = {}
    for k, v in d.items():
        flat[k] = {"title": v["title"], "subtopics": flatten_structure(v["subtopics"])}
    return flat

# To print
print_structure(hierarchy)

# If you want the structured dict
nested_outline = flatten_structure(hierarchy)


1. .  Title Page
2. .  Abstract (1 page)
3. .  Table of Contents (1 page)
4. .  Introduction (3 pages)
5. .  Literature Review (5 pages)
6. .  Materials and Methods (4 pages)
7. .  Results (5 pages)
8. .  Discussion (4 pages)
9. .  Conclusion (2 pages)
10. . References (2 pages)
11. . Appendices


In [None]:
import re
from collections import defaultdict

# Function to create nested dictionaries automatically
def nested_dict():
    return defaultdict(nested_dict)

# Initialize the root of the structure
outline = nested_dict()

# Replace with your full project outline as a string
project_text = project_text.strip()
# Build the nested dict
for line in project_text.strip().split('\n'):
    match = re.match(r'^(\d+(?:\.\d+)*)(?:\s+)?(.*)', line.strip())
    if match:
        key_parts = match.group(1).split('.')
        title = match.group(2).strip()

        # Traverse into the nested dictionary
        current = outline
        for part in key_parts[:-1]:
            current = current[part]["subtopics"]
        current[key_parts[-1]] = {"title": title, "subtopics": {}}

# Optional: Convert defaultdict to plain dict
def dictify(d):
    return {
        k: {"title": v["title"], "subtopics": dictify(v["subtopics"])}
        for k, v in d.items()
    }

nested_outline = dictify(outline)


In [24]:
nested_outline

{'1': {'title': '.  Title Page',
  'subtopics': {'1': {'title': '. Title of the Project', 'subtopics': {}},
   '2': {'title': '. Author(s)', 'subtopics': {}},
   '3': {'title': '. Affiliation', 'subtopics': {}},
   '4': {'title': '. Date of Submission', 'subtopics': {}}}},
 '2': {'title': '.  Abstract (1 page)',
  'subtopics': {'1': {'title': '. Summary of the study', 'subtopics': {}},
   '2': {'title': '. Methods used', 'subtopics': {}},
   '3': {'title': '. Key findings', 'subtopics': {}},
   '4': {'title': '. Conclusion', 'subtopics': {}}}},
 '3': {'title': '.  Table of Contents (1 page)', 'subtopics': {}},
 '4': {'title': '.  Introduction (3 pages)',
  'subtopics': {'1': {'title': '. Background of Photosynthesis',
    'subtopics': {'1': {'title': '. Importance of Photosynthesis in Ecosystems',
      'subtopics': {}},
     '2': {'title': '. Basic Principles of Photosynthesis', 'subtopics': {}}}},
   '2': {'title': '. Factors Affecting Photosynthesis', 'subtopics': {}},
   '3': {'tit

In [None]:
def yield_sections(data, prefix=""):
    for key, value in data.items():
        section_number = f"{prefix}.{key}" if prefix else key
        title = value["title"]
        subtopics = value["subtopics"]

        yield {
            "section": section_number,
            "title": title,
            "children": list(yield_sections(subtopics, section_number))
        }


In [6]:
for section in yield_sections(nested_outline):
    print(section)
    print("\n")

{'section': '1', 'title': '. Title Page', 'children': [{'section': '1.1', 'title': 'Project Title: CKD Prediction Using Machine Learning Techniques', 'children': []}, {'section': '1.2', 'title': 'Your Name', 'children': []}, {'section': '1.3', 'title': 'Affiliation (e.g., University Name, Department)', 'children': []}, {'section': '1.4', 'title': 'Submission Date', 'children': []}, {'section': '1.5', 'title': 'Course Name (if applicable)', 'children': []}, {'section': '1.6', 'title': 'Supervisor/Instructor Name (if applicable)', 'children': []}]}


{'section': '2', 'title': '. Abstract (Approximately 1 page)', 'children': [{'section': '2.1', 'title': 'Briefly introduce Chronic Kidney Disease and its significance.', 'children': []}, {'section': '2.2', 'title': 'State the aim of the project (CKD prediction using ML).', 'children': []}, {'section': '2.3', 'title': 'Briefly mention the machine learning techniques used.', 'children': []}, {'section': '2.4', 'title': 'Summarize the key findi

In [2]:
def section_generator(data):
    stack = [(k, v, "") for k, v in sorted(data.items())]

    while stack:
        key, value, prefix = stack.pop(0)
        section_number = f"{prefix}.{key}" if prefix else key
        title = value["title"]
        subtopics = value["subtopics"]

        # Prepare children to yield later
        children = []
        if subtopics:
            child_gen = section_generator(subtopics)
            for child in child_gen:
                children.append(child)

        yield {
            "section": section_number,
            "title": title,
            "children": children
        }


In [30]:
gen = section_generator(nested_outline)

In [31]:
try:
    section = next(gen)
    print(section)
except StopIteration:
    print("No more sections.")



['1', '.  Title Page', [['1', '. Title of the Project', []], ['2', '. Author(s)', []], ['3', '. Affiliation', []], ['4', '. Date of Submission', []]]]


pick from there 

In [1]:
project_text= """
    Subject: Science & Engineering
    Chapter: Biology/Botany
    Topic: Rate of Photosynthesis

    1.  Title Page
        1.1. Title of the Project
        1.2. Author(s)
        1.3. Affiliation
        1.4. Date of Submission

    2.  Abstract (1 page)
        2.1. Summary of the study
        2.2. Methods used
        2.3. Key findings
        2.4. Conclusion

    3.  Table of Contents (1 page)

    4.  Introduction (3 pages)
        4.1. Background of Photosynthesis
            4.1.1. Importance of Photosynthesis in Ecosystems
            4.1.2. Basic Principles of Photosynthesis
        4.2. Factors Affecting Photosynthesis
        4.3. Objectives of the Study
        4.4. Hypothesis

    5.  Literature Review (5 pages)
        5.1. Previous Studies on Photosynthetic Rates
            5.1.1. Research on Different Plant Species
            5.1.2. Impact of Environmental Conditions (Light, CO2, Temperature)
        5.2. Methodologies Used in Photosynthesis Research
        5.3. Gaps in Existing Research

    6.  Materials and Methods (4 pages)
        6.1. Plant Materials
            6.1.1. Selection of Plant Species
            6.1.2. Preparation of Samples
        6.2. Experimental Setup
            6.2.1. Controlled Environmental Conditions
            6.2.2. Measurement Tools (e.g., Photosynthesis Measurement System) 
        6.3. Experimental Design
            6.3.1. Variables (Independent and Dependent)
            6.3.2. Control Groups
        6.4. Procedure
            6.4.1. Step-by-step explanation
        6.5. Data Collection
        6.6. Statistical Analysis

    7.  Results (5 pages)
        7.1. Photosynthetic Rates Under Different Conditions
            7.1.1. Tables and Graphs Representing Data
        7.2. Statistical Analysis Results
            7.2.1. Significance Testing

    8.  Discussion (4 pages)
        8.1. Interpretation of Results
        8.2. Comparison with Existing Literature
        8.3. Limitations of the Study
        8.4. Sources of Error

    9.  Conclusion (2 pages)
        9.1. Summary of Findings
        9.2. Implications of the Study
        9.3. Future Research Directions

    10. References (2 pages)

    11. Appendices
        11.1. Raw Data
        11.2. Statistical Analysis Output
        11.3. Equipment Specifications

    """

In [2]:
import re
from collections import defaultdict

def parse_outline(text):
    outline = {}
    stack = []

    for line in text.strip().splitlines():
        line = line.strip()
        match = re.match(r"^(\d+(\.\d+)*)(\.)?\s+(.*)", line)
        if match:
            number = match.group(1)
            title = match.group(4)
            level = number.count('.')
            node = {"number": number, "title": title, "children": {}}

            if level == 0:
                outline[number] = node
                stack = [(number, node)]
            else:
                while len(stack) > level:
                    stack.pop()
                parent = stack[-1][1]
                parent["children"][number] = node
                stack.append((number, node))
    return outline


def yield_sections(outline):
    for key in sorted(outline, key=lambda x: float(x)):
        section = outline[key]
        title_line = f"{section['number']}. {section['title']}"
        subtopic_lines = []
        for sub_key in sorted(section["children"], key=lambda x: list(map(float, x.split('.')))):
            sub = section["children"][sub_key]
            subtopic_lines.append(f"{sub['number']}. {sub['title']}")
        yield title_line, subtopic_lines


        


In [3]:
project_outline = parse_outline(project_text)

In [4]:
project_outline

{'1': {'number': '1',
  'title': 'Title Page',
  'children': {'1.1': {'number': '1.1',
    'title': 'Title of the Project',
    'children': {}},
   '1.2': {'number': '1.2', 'title': 'Author(s)', 'children': {}},
   '1.3': {'number': '1.3', 'title': 'Affiliation', 'children': {}},
   '1.4': {'number': '1.4', 'title': 'Date of Submission', 'children': {}}}},
 '2': {'number': '2',
  'title': 'Abstract (1 page)',
  'children': {'2.1': {'number': '2.1',
    'title': 'Summary of the study',
    'children': {}},
   '2.2': {'number': '2.2', 'title': 'Methods used', 'children': {}},
   '2.3': {'number': '2.3', 'title': 'Key findings', 'children': {}},
   '2.4': {'number': '2.4', 'title': 'Conclusion', 'children': {}}}},
 '3': {'number': '3', 'title': 'Table of Contents (1 page)', 'children': {}},
 '4': {'number': '4',
  'title': 'Introduction (3 pages)',
  'children': {'4.1': {'number': '4.1',
    'title': 'Background of Photosynthesis',
    'children': {'4.1.1': {'number': '4.1.1',
      'titl

In [5]:

section_generator = yield_sections(project_outline)

In [6]:
section_generator

<generator object yield_sections at 0x000001F8CD9E9900>

In [9]:
section = next(section_generator)

In [8]:
section

('1. Title Page',
 ['1.1. Title of the Project',
  '1.2. Author(s)',
  '1.3. Affiliation',
  '1.4. Date of Submission'])

In [10]:
section

('2. Abstract (1 page)',
 ['2.1. Summary of the study',
  '2.2. Methods used',
  '2.3. Key findings',
  '2.4. Conclusion'])