# Generating Fake Invoice Data

This notebook demonstrates how to generate fake invoice data using Python, specifically focusing on the fields relevant to a legal context. We will use the Faker library to create realistic data for testing and development purposes.

## Table of Contents
1. [Introduction](#introduction)
2. [Installation](#installation)
3. [Importing Libraries](#importing-libraries)
4. [Defining Data Fields](#defining-data-fields)
5. [Generating Data](#generating-data)
6. [Exporting to CSV](#exporting-to-csv)

## Introduction
In legal practice management, invoices are crucial for billing and tracking services. This notebook will guide you through creating fake invoice data, including various legal-specific fields such as invoice number, law firm name, matter details, timekeeper information, and billing details.

## Installation
First, ensure you have the required library installed. You can install the Faker library using pip:

```bash
pip install faker

## Importing Libraries

In [1]:
import pandas as pd
import numpy as np
from faker import Faker

## Defining Data Fields

We will define the following data fields for our invoice data:

•	**Invoice Number**: Unique identifier for each invoice.

•	**Invoice Date**: Date when the invoice was issued.

•	**Law Firm Name**: Name of the law firm providing the services.

•	**Law Firm Location**: Location of the law firm 

•	**Matter Number**: Unique identifier for each legal matter or case.

•	**Matter Name**: Name or description of the legal matter.

•	**Matter Type**: Category of the legal matter (e.g., Litigation, Intellectual Property).

•	**Practice Group**: Specific area within the law firm handling the matter.

•	**Code**: Code representing the type of task performed (e.g., L100 for initial case assessment).

•	**Line Item Description**: Detailed description of the services provided.

•	**Timekeeper Name**: Name of the individual who performed the work.

•	**Timekeeper Classification**: Role or level of the timekeeper (e.g., Partner, Associate).

•	**Hours Billed**: Number of hours billed for the task.

•	**Hourly Rate**: Billing rate per hour for the timekeeper.

•	**Amount**: Total cost for the line item (Hours Billed × Hourly Rate).

•	**Date of Service**: Date when the service was performed.

## Generating Data

In [2]:
# Initialize Faker
faker = Faker()

# Set random seed for reproducibility
np.random.seed(42)

# Sample data generation parameters
num_rows = 10000
law_firms = [faker.company() for _ in range(10)]
locations = [faker.city() for _ in range(10)]
matter_types = ['Litigation', 'Intellectual Property', 'Corporate', 'Real Estate', 'Employment']
practice_groups = ['Corporate Law', 'Family Law', 'Criminal Law', 'Intellectual Property', 'Tax Law']
task_codes = [f'L{str(i).zfill(3)}' for i in range(100, 200)] # Starting with L
timekeeper_names = [faker.name() for _ in range(50)]
timekeeper_classes = ['Partner', 'Associate', 'Paralegal', 'Junior Associate']

# Generate the data
data = {
    "Invoice Number": [faker.uuid4() for _ in range(num_rows)],
    "Invoice Date": [faker.date_this_decade() for _ in range(num_rows)],
    "Law Firm Name": np.random.choice(law_firms, num_rows),
    "Law Firm Location": np.random.choice(locations, num_rows),
    "Matter Number": [faker.uuid4() for _ in range(num_rows)],
    "Matter Name": [faker.sentence(nb_words=5) for _ in range(num_rows)],
    "Matter Type": np.random.choice(matter_types, num_rows),
    "Practice Group": np.random.choice(practice_groups, num_rows),
    "Task Code": np.random.choice(task_codes, num_rows),
    "Line Item Description": [faker.text(max_nb_chars=50) for _ in range(num_rows)],
    "Timekeeper Name": np.random.choice(timekeeper_names, num_rows),
    "Timekeeper Classification": np.random.choice(timekeeper_classes, num_rows),
    "Hours Billed": np.random.randint(1, 10, num_rows),  # 1 to 9 hours
    "Hourly Rate": np.random.randint(150, 600, num_rows),  # Rates between $150 and $600
    "Amount": np.random.randint(150, 600, num_rows) * np.random.randint(1, 10, num_rows),  # Total cost
    "Date of Service": [faker.date_this_decade() for _ in range(num_rows)],
}

# Create a DataFrame
invoice_df = pd.DataFrame(data)


## Exporting to CSV

Finally, we will export the generated invoice data to a CSV file for further use.

In [3]:
invoice_df

Unnamed: 0,Invoice Number,Invoice Date,Law Firm Name,Law Firm Location,Matter Number,Matter Name,Matter Type,Practice Group,Task Code,Line Item Description,Timekeeper Name,Timekeeper Classification,Hours Billed,Hourly Rate,Amount,Date of Service
0,ad32a91f-0c7d-4a82-a166-b918ec46458b,2020-04-10,Walker Group,New Alyssaton,e6367985-45df-4ca3-8288-04118e6bbc7f,Space use end way.,Real Estate,Family Law,L193,Under save recently.,Chris Olsen,Associate,8,406,1604,2022-09-18
1,af97db07-bce7-46dc-a266-9018b84bf015,2022-11-24,Stewart-Mendoza,West James,03e90f46-bf24-4060-90d0-027e7fff1e1f,Service truth speech somebody.,Employment,Criminal Law,L138,Place green rich myself lay save onto.,Kelly Tucker,Paralegal,8,226,2325,2023-02-18
2,8c321990-2ade-46bf-9305-ff7b3acae9fb,2022-01-20,"Kidd, Tapia and Kane",New Alyssaton,a625b186-7341-4702-85a1-8009f03426cf,Nearly pressure ball father local early.,Employment,Corporate Law,L117,Hot help you money form.,Kari Chung,Associate,6,578,2405,2021-09-17
3,b4584295-07ff-4af8-8d81-6cd2bf22e870,2023-05-16,Stokes-Pearson,North Brenda,f3a0fd0c-5bc0-4ddc-bfca-56158994f12a,Budget bed break administration generation.,Employment,Criminal Law,L159,Raise economy simply.,Joel Conrad,Associate,5,153,1580,2021-06-30
4,65430c74-911d-4148-bae1-7a026ad809e9,2021-07-26,Walker Group,East Shawn,fc096d3b-d054-45f0-8029-63fc14a1a776,Show find stay stock old plan.,Real Estate,Corporate Law,L185,Company prove coach here throw.,Joel Conrad,Associate,4,460,4824,2020-08-27
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9995,c652244e-06e7-4471-98a0-58ce54fd5a00,2022-12-04,Walker Group,South Maryview,0b183a5c-87f7-4023-9812-0add380beb37,Red these lay no.,Intellectual Property,Family Law,L124,Prove Democrat as. Each own successful reduce.,Dylan Fernandez,Partner,8,261,1090,2021-06-05
9996,788f74c2-3e06-40ba-9f5e-077f1d1ff85f,2020-03-16,"Wells, Ramirez and Ortiz",South Maryview,fe47ad4a-6060-411e-b3a4-92437e15a0aa,We ever bit black.,Litigation,Intellectual Property,L113,Great pass offer decide. Kind everyone old loss.,Kristen Kemp,Associate,9,431,2584,2021-10-24
9997,7afa3f85-fd79-4a48-86d4-24c7df40cdbd,2024-06-23,"Robinson, Jackson and Avila",West Teresa,eb374933-ddf6-463b-92ef-b15e3fef3e5d,Billion he foot skill.,Intellectual Property,Corporate Law,L144,They personal woman hard early operation.,Andrew Compton,Partner,3,427,2905,2023-01-05
9998,bb73dadd-87d8-4f03-8551-1f0a0f6e1939,2024-10-23,"Wells, Ramirez and Ortiz",South Maryview,31061e5d-5453-41ff-805c-7d2c4fe58ead,Recognize lawyer gun later these.,Corporate,Intellectual Property,L143,Group beat national good drive affect.,Crystal Payne,Paralegal,8,548,1520,2021-11-21


In [4]:
# Export to CSV
output_file_path = 'sample_invoices.csv'
invoice_df.to_csv(output_file_path, index=False)

print(f"Generated invoice data saved to {output_file_path}")

Generated invoice data saved to sample_invoices.csv


## Conclusion

In this notebook, we demonstrated how to generate fake invoice data relevant to legal practice management using Python. This fake data can be used for testing, development, or demonstrating software functionalities without exposing sensitive information.

---

# More testing

In [5]:
faker.date_this_decade()

datetime.date(2022, 5, 23)

In [6]:
faker.sentence(nb_words=5) 

'Paper police debate nation lot.'

In [7]:
faker.uuid4()

'98b20f0c-2c37-4b10-95dd-df81fea014e2'

In [10]:
faker.text(max_nb_chars=50)

'Machine indeed clear remain line hair.'

### More customized legal descriptions with a fixed list of legal activities

In [11]:
# Generate a comprehensive list of line item descriptions with more variations
line_item_descriptions = [
    "Initial consultation with client",
    "Drafting of legal documents",
    "Review of contractual obligations",
    "Negotiation of settlement terms",
    "Filing of court documents",
    "Research on case law and statutes",
    "Preparation of discovery requests",
    "Review of opposing counsel's documents",
    "Attendance at court hearings",
    "Preparation of trial exhibits",
    "Client communication and updates",
    "Legal analysis and opinion",
    "Drafting of motions and briefs",
    "Preparation for depositions",
    "Compliance review for regulations",
    "Conducting due diligence investigations",
    "Preparation of case summaries",
    "Development of case strategy",
    "Expert witness coordination",
    "Review of legal invoices for accuracy",
    "Legal research on trademark issues",
    "Translation of legal documents",
    "Legal opinion on tax matters",
    "Review of merger and acquisition agreements",
    "Preparation of legal memoranda",
    "Conducting risk assessments",
    "Review of corporate policies",
    "Managing legal hold processes",
    "Assessing liability in claims",
    "Legal training and compliance programs",
    "Collaborating with external legal teams",
    "Coordinating with compliance departments",
    "Advising on regulatory changes",
    "Handling patent applications",
    "Negotiating contract terms",
    "Reviewing lease agreements",
    "Providing legal opinions",
    "Drafting employee contracts",
    "Managing litigation budgets",
    "Conducting fraud investigations",
    "Assisting with settlement negotiations",
    "Preparing trial strategies",
    "Reviewing construction contracts",
    "Conducting audits for compliance",
    "Advising on data privacy laws",
    "Reviewing intellectual property portfolios",
    "Assisting in mergers and acquisitions",
    "Developing internal policies",
    "Managing client relationships",
    "Handling appeals and motions",
    "Providing general counsel services",
    "Drafting corporate resolutions",
    "Reviewing safety regulations",
    "Advising on international trade laws",
    "Handling immigration issues",
    "Conducting anti-money laundering checks",
    "Reviewing shareholder agreements",
    "Drafting waivers and releases",
    "Managing employee disputes",
    "Reviewing insurance claims",
    "Assisting in criminal defense",
    "Providing pro bono legal services",
    "Handling alternative dispute resolutions",
]


In [12]:
np.random.choice(line_item_descriptions, num_rows)

array(['Handling patent applications', 'Reviewing insurance claims',
       'Drafting of legal documents', ...,
       'Handling alternative dispute resolutions',
       'Preparing trial strategies', 'Managing litigation budgets'],
      dtype='<U43')