### ü§ñ Chat with PDF

In [31]:
# Importing Needed library 
import os
from pypdf import PdfReader
import dotenv
from openai import OpenAI
import chromadb
from IPython.display import display, Markdown

dotenv.load_dotenv()

True

### ‚öôÔ∏è Configuration

In [None]:
PDF_FILE_PATH = "../../data/02-RAG_Systems/simple_rag/Classic_Airent-3.pdf"
CHROMA_COLLECTION_NAME = "datasheet_rag"
OPENAI_KEY = os.getenv("OPENAI_API_KEY")
OPENAI_ENDPOINT = os.getenv("OPENAI_ENDPOINT")

CHAT_DEPLOYMENT = "o3" # Decoder
EMBEDDING_DEPLOYMENT = "text-embedding-3-small" # Encoder

### üöÄ Initiating OpenAI Client & Chroma DB (In-Memory)

In [7]:
client = OpenAI(
    base_url=OPENAI_ENDPOINT,
    api_key=OPENAI_KEY
)

chroma_client = chromadb.Client()
collection = chroma_client.create_collection(name=CHROMA_COLLECTION_NAME)

### üìö Helper Functions

In [18]:
def get_embedding(text):
    """Generates vector embedding for a string using Azure OpenAI."""
    text = text.replace("\n", " ") 
    response = client.embeddings.create(
        input=[text],
        model=EMBEDDING_DEPLOYMENT
    )
    return response.data[0].embedding

In [23]:
len(get_embedding("Who is the current chief minister"))

1536

In [9]:
def split_sections(text: str):
    sections = []
    current_header = None
    current_lines = []

    for line in text.splitlines():
        if line.strip().endswith(":"):  # header line
            # save previous section
            if current_header is not None:
                sections.append({
                    "header": current_header.replace(":",""),
                    "content": "\n".join(current_lines).strip().replace("\uf0b7", "")
                })
            # start new section
            current_header = line.strip()
            current_lines = []
        else:
            current_lines.append(line)

    # last section
    if current_header is not None:
        sections.append({
            "header": current_header,
            "content": "\n".join(current_lines).strip().replace("\uf0b7", "")
            
        })

    return sections

### 1Ô∏è‚É£  Loading PDF...

In [None]:
reader = PdfReader(PDF_FILE_PATH)
full_text = ""
for page in reader.pages:
    full_text += page.extract_text()

In [None]:
(full_text)

### ‚úÇÔ∏è  Chunking Text...

In [None]:
sections = split_sections(full_text)
product_name = PDF_FILE_PATH.split('/')[-1].split('.')[0].replace('_', ' ')
print(f"Total sections in {product_name}: {len(sections)}")
for section in sections:
    header = section['header']
    content = section['content']
    print(f"{header} - Length of the content: {len(content)}")

chunks = [f"{sect['header']} of {product_name}:\n{sect['content']}" for sect in sections if sect['content'] != '']

Total sections in foss: 50
questions were identified. These are - Length of the content: 7681
adoption at KITE - Length of the content: 1233
identified - Length of the content: 1844
include - Length of the content: 437
3. AePS (Aadhaar Enabled Payment System) - Length of the content: 1363
driven by the following factors - Length of the content: 1530
adopting FOSS - Length of the content: 956
include - Length of the content: 5576
benefits from adopting FOSS solutions - Length of the content: 6811
are listed below - Length of the content: 1485
major benefits that Razorpay derives from FOSS - Length of the content: 4010
from adopting FOSS solutions - Length of the content: 555
that FOSS adoption brings with it some challenges - Length of the content: 4967
from adopting open source solutions - Length of the content: 3779
the following benefits using FOSS - Length of the content: 4141
experienced the following benefits of FOSS - Length of the content: 2936
Students satisfying any one of the

### üíæ Generating Embeddings & Storing...

In [64]:
ids = [str(i) for i in range(len(new_chunks))]
embeddings = []

# Loop through chunks and generate embeddings (Batching is better for production)
for i, chunk in enumerate(new_chunks):
    vec = get_embedding(chunk)
    embeddings.append(vec)
    if i % 5 == 0: print(f"   -> Processed {i+1}/{len(new_chunks)} chunks...", end="\r")


   -> Processed 56/57 chunks...

In [65]:
collection.add(
    documents=new_chunks,
    embeddings=embeddings,
    ids=ids
)
print("\n   -> Indexing complete!")


   -> Indexing complete!


### üß† RETRIEVAL & GENERATION LOOP

In [None]:
user_queries = [
    "What is the packing variants of airent -3?",
    "What is the dosing of classic airent 3 needed for 25 kg cement?",
    "How to use classic airent?",
]
rag_system_prompt = """You are a helpful assistant. Use the provided context to answer the question.
    If the answer is not in the context, say you don't know."""
    

common_system_prompt = """You are a helpful assistant who has vast experience in construction and construction chemical field. Using your knowledge, answer the question."""


In [95]:
for ch in new_chunks:
    if "KITE".lower() in ch.lower():
        print(ch)
        break

questions were identified. These are of foss:
1. How and to what extent are organisations 
using FOSS?
2. What are the benefits (tangible and non-
tangible) they experience by virtue of adopting 
FOSS?
3. What are the challenges of working with FOSS?
4. What are the factors behind the organisation‚Äôs 
choice of software?
5. What potential legal and policy measures can 
support and promote FOSS in India?
As indicated earlier, within our mixed methods 
research framework, we adopted the case study 
approach to comprehensively address these research 
questions. To build methodologically rigorous case 
studies, we prepared a detailed, semi-structured 
questionnaire.
While it would have been preferable to build case 
studies from all sectors, we had to limit our case 
studies to four sectors (healthcare, education, 
finance and software and IT services) due to 
time and resource constraints. However, efforts 
were made to ensure greater diversity by trying 
to have four categories in each 

In [None]:
for query in user_queries:
    query_vec = get_embedding(query)
    results = collection.query(
        query_embeddings=[query_vec],
        n_results=3
    )
    retrieved_context = "\n\n".join(results['documents'][0])
    print(f"\nRetrieved Context for {query}:\n{retrieved_context}")

    rag_user_message = f"""
        Context:
        {retrieved_context}

        Question: 
        {query}
        """
    rag_chat_response = client.chat.completions.create(
        model=CHAT_DEPLOYMENT,
        messages=[
            {"role": "system", "content": rag_system_prompt},
            {"role": "user", "content": rag_user_message}
        ],
    )
    common_user_message = f"""Question: {query}"""
    common_chat_response = client.chat.completions.create(
        model=CHAT_DEPLOYMENT,
        messages=[
            {"role": "system", "content": common_system_prompt},
            {"role": "user", "content": common_user_message}
        ],
    )
    print(f"\nü§ñ Answer: {rag_chat_response.choices[0].message.content}")
    print(f"\nü§ñ Common Answer: {common_chat_response.choices[0].message.content}")
    


---
### üôã‚Äç‚ôÇÔ∏è Question
`What is the HR payroll for Proprietary software?`

<details>
<summary><strong>Retrieved Context</strong></summary>

implementation of foss:<br>1. The long-term survivability of FOSS projects is <br>contingent upon them being associated with a <br>foundation that provides support and oversees <br>its governance. Otherwise, they risk becoming <br>EOL if the project‚Äôs originators do not see <br>value in continuing to maintain them. This <br>has an adverse impact on downstream projects.<br>2. FOSS projects that drive impact both for <br>businesses and society are not necessarily <br>the ones that are attractive for developers <br>to work on. This hinders the uptake of their <br>development.<br>3. Universities have a preference towards having <br>students working on end-to-end software <br>projects. In contrast, contributing to FOSS <br>projects involves numerous minor bug fixes and <br>code improvements to a large project. Hence, <br>universities are unwilling to recognise the <br>contributions to FOSS projects for fulfilment <br>of criteria for student internships, as the work <br>done cannot be easily evaluated. In this context, <br>it needs to be added that, in reality, end-to-<br>end software projects as a category are seldom <br>implemented, while work done towards FOSS <br>contribution is often valuable.<br>4. When new workers enter the job market <br>without adequate exposure to FOSS, they find <br>it difficult to understand and work with FOSS, <br>given its distributed architecture.36<br>Economic Value Addition<br>Dhiway spends $10,000 annually on proprietary <br>software for their internal needs, which includes <br>Zoho Suite, Google Workspace and GreytHR. <br>They estimate that using managed FOSS <br>equivalents would cost them only around $8000. <br>They also point out that developing equivalent <br>solutions in-house would be expensive, and they <br>estimate the cost to be around $15,000.<br> <br>Thoughtworks<br>Thoughtworks is a leading global technology <br>consultation organisation founded in Chicago in <br>1993. The organisation pioneered the application of <br>agile software development to globally distributed <br>teams known as distributed agile to develop custom <br>software solutions for its clients.<br>The organisation is an extensive user of, and <br>contributor to, open source since its inception. It <br>views the philosophical concept of open source as <br>a driver of software quality that provides the ability <br>to build superior solutions.<br>In the span of just 30 years, the organisation <br>has grown exponentially to more than 10,000 <br>employees, and it has 48 offices across 19 <br>countries.36 The company generated a revenue of <br>$1,126 million for the F.Y. 2023 with the figure <br>for the Asia-Pacific region being $387 million.37<br>Note: The information garnered for this case <br>study, though relating to Thoughtworks, is specific <br>to the Digital Public Goods (DPG) vertical of the <br>organisation. This vertical typically caters to clients <br>like governments, foundations and NGOs, and <br>multilaterals like the World Bank. It also operates <br>with lower revenues and generates lower profits, <br>with its operations slightly overlapping with the <br>CSR function.<br>FOSS at Thoughtworks<br>Thoughtworks was ranked among the top 25 <br>contributors to GitHub by the Open Source <br>Contributor Index. 38 They have also contributed <br>to the development of 14 DPGs, key among them <br>being the development of Bahmni,39 an open source <br>hospital management system used in more than 50 <br>countries. The organisation has also utilised CSR <br>funds to finance FOSS projects.<br>The development of the following open source <br>tools, popularly used across the industry today, was <br>also undertaken by the organisation.<br>1. Selenium: A suite of tools for web browser <br>automation, used for testing web applications.<br>2. CruiseControl: A Java-based framework for <br>continuous integration.<br>3. Mingle: A project management and <br>collaboration solution.<br>Since the organisation primarily builds software for <br>clients, the choice of the type of software component <br>largely depends on client-induced constraints like <br>delivery timeline, budget, as well as operating and <br>maintenance costs. While most clients are agnostic <br>36. ‚ÄòOur History‚Äô (Thoughtworks) <https://www.thoughtworks.com/en-in/about-us/history> accessed 3 February 2025.<br>37.‚ÄòThoughtworks Reports Fourth Quarter and Full Year 2023 Financial Results‚Äô (Thoughtworks) <https://investors.thoughtworks.com/news-releases/news-<br>release-details/thoughtworks-reports-fourth-quarter-and-full-year-2023-financial/> accessed 3 February 2025. <br>38.  ‚ÄòOSCI ‚Äì Open Source Contributor Index‚Äô <https://opensourceindex.io/> accessed 3 February 2025.<br>39.  ‚ÄòBahmniTM Open Source EMR & Hospital Information System (HMIS)‚Äô (Bahmni) <https://www.bahmni.org> accessed 3 February 2025.37<br>to the underlying technology stack of their software <br>solutions, those keen on providing a better user <br>experience to their customers prefer a custom-<br>built solution that differentiates them from their <br>competition.<br>The DPG vertical‚Äôs clients, on the other hand, <br>generally prefer FOSS-based software solutions <br>that are then hosted in their own data centres on <br>basic commodity hardware. These include UIDAI, <br>NPCI, ONDC etc., while some UN agencies use <br>proprietary software.<br>The organisation‚Äôs preference for FOSS is a <br>consequence of multiple factors including the <br>accessibility of available codebases and FOSS‚Äô <br>fitments for custom software development, and is <br>aided by the geek culture of the organisation where <br>engineers enjoy working with FOSS. They regularly <br>organise ‚ÄòGeek nights‚Äô, a form of internal hackathon <br>where employees are provided a problem statement <br>and tasked with developing a solution. Employees <br>are also provided with a budget for R&D and self-<br>learning, depending on the financial situation of <br>the organisation and market conditions.<br>The organisation estimates that about 20% of their <br>developed code is reused internally. However, this <br>is only an estimate and not a calculated metric.<br>The organisation uses proprietary vendor-based <br>solutions for internal software requirements despite <br>sufficient interest amongst developers to build <br>these applications.<br>Benefits and Challenges<br>Our conversations indicate that the following are <br>the major benefits seen by Thoughtworks in using<br><br> #### followed by the following technical factors of foss:<br>1. Feasibility of integration of the component <br>with existing/ external systems.<br>2. Ease of usage of the component.<br>3. Extent of customisation possible.<br>4. Favourable licensing terms (this refers <br>to a preference for permissively licensed <br>components).53<br>Start-ups, followed by non-profits, reported <br>considering the largest number of business <br>or organisational factors. In this aspect, cost <br>effectiveness of the component and the presence of <br>an active community surrounding the project were <br>the most important factors for organisations when<br>evaluating FOSS components. The availability <br>of support, either from the community or as <br>a managed service, featured as the next most <br>important factor followed by the extent of use of <br>the component in the industry.<br>The presence of a community being evaluated as <br>a factor (which is different from support from the <br>community) emerged as a new finding, indicating <br>that the community around an FOSS project fulfils <br>a purpose beyond providing technical support. It <br>could be an indicator for popularity of the project <br>or could reflect the value it provides, which in turn <br>could attract a larger community.<br>FOSS Alternatives<br>All of the organisations that responded to the <br>question about the presence of proprietary or <br>software developed in-house in their stack, <br>reported using either proprietary, licensed or SaaS <br>applications for various purposes. <br>We observe that proprietary software is largely <br>employed for two main purposes, namely, email <br>services, and/ or workspace collaboration tools and <br>internally used applications.<br>The reasoning provided for this included the lack <br>of equivalent open source alternatives due to the <br>complexity of the application, and attempts by <br>large email providers to maintain a leverage in the <br>market by disincentivising emails not originating <br>from their servers.<br>As indicated earlier, many organisations also <br>use proprietary software or SaaS applications <br>for internal needs, i.e., solutions used within the <br>organisation by employees or for its operations. <br>This includes instant messaging, HR, finance, <br>payroll, ERP, etc. The reasoning adopted in this<br><br> #### sample of foss:<br>1. Revenue loss due to free-riding competitors.<br>2. Testing required due to lack of certification <br>for FOSS.<br>3. Switching costs from proprietary solutions.<br>Table 3. Top Challenges of FOSS<br>SL.NO. CHALLENGE ORGANISATIONS THAT REPORTED <br>THE CHALLENGE<br>T racking and applying patches and <br>updates<br>Zerodha, Razorpay, [A private bank], PupilFirst, <br>KG Hospital<br>Availability of personnel with skills in <br>FOSS technologies [A private bank], T ech4Good, Dhiway, NxtGen<br>Lack of community support Zerodha, Razorpay, Pocket ATM, T ech4Good, <br>PupilFirst<br>Performing maintenance and support Remiges, Kalvium<br>Integration/ compatibility with existing <br>IT systems KITE, T ech4Good, Pocket ATM<br>Lack of contributions to FOSS Swasth, Dhiway, OHC<br>No revenue model to sustain FOSS <br>projects ThoughtWorks, Swasth, PupilFirst<br>T roubleshooting issues with FOSS NPCI, NxtGen<br>3<br>4<br>5<br>6<br>7<br>8<br>2<br>159<br>66. Vikrant Narayan Vasudeva, Open Source Software and Intellectual Property Rights (Kluwer Law International 2014).<br>67. ‚ÄòFrequently Asked Questions about the GNU Licenses - GNU Project - Free Software Foundation‚Äô (GNU Operating System) <https://www.gnu.org/licenses/<br>gpl-faq.en.html#TradeSecretRelease> accessed 22 January 2025. <br>68. Eric S. Raymond, ‚ÄòAnnouncement of ‚ÄúOSI Certified‚Äù Open Source Mark‚Äô (Open Source Initiative, 16 June 1999) <https://opensource.org/pressreleases/<br>certified-open-source-php> accessed 21 January 2025.<br>69. ibid. <br>70. Andres Guadamuz, ‚ÄòLegal Challenges to Open Source Licenses‚Äô (2005) Script-ed <https://era.ed.ac.uk/handle/1842/2272> accessed 17 December 2024.<br>71. Richard Stallman, ‚ÄòFighting Software Patents - Singly and Together‚Äô (GNU Operating System) <https://www.gnu.org/philosophy/fighting-software-patents.<br>en.html> accessed 26 January 2025.<br>72. Kirk Rowe, ‚ÄòWhy Pay for What‚Äôs Free?: Minimizing the Patent Threat to Free and Open Source Software‚Äô (2008) 7 The John Marshall Review of <br>Intellectual Property Law <https://repository.law.uic.edu/ripl/vol7/iss3/9> accessed 3 March 2025. As cited in Vasudeva (n 66). <br>IP and Licensing<br>Five types of IP are relevant in the context of <br>software in many jurisdictions: trade secret, <br>copyright, patent, designs, and trademark. 66 <br>However, not all these forms of IP protection are <br>utilised by individuals and organisations uniformly <br>across software, as their usage depends on the <br>developers‚Äô preferences as well as on applicable <br>national laws.<br>The key critique of the traditional IP protection <br>system in the context of FOSS is that it inherently<br>conflicts with basic FOSS principles. For instance, <br>trade secrets protect confidential information,and <br>this is against the basic ethos of open source. In <br>fact, the FSF explicitly regards trade secrets as a <br>GPL (General Public License) violation.67<br>However, it needs to be noted that the open source <br>community has also tried to leverage some forms <br>of IP protection in a unique manner to protect <br>themselves at times. For instance, trademark is <br>a sign used for protecting source identification, <br>building consumer trust, and thereby also building <br>a brand, in the long term. So, while giving up <br>other forms of IP protection such as copyright, <br>patents and trade secrets, an organisation may rely <br>on their trademark-related rights to maintain the <br>competitive edge in the market. While the term <br>‚Äòopen source‚Äô cannot be trademarked because of its <br>descriptive character,68 many members of the open <br>source community can be now seen leveraging <br>certification marks such as ‚ÄòOSI Certified‚Äô, 69 <br>administered by the OSI, to signal compliance <br>with open source principles.<br>The organisations surveyed for the study showed <br>an awareness of issues related to licensing and <br>patents but reported minimal direct challenges <br>with IP protection. This could be partly due to the<br>absence of any major litigations in India in the <br>area of FOSS. However, it also needs to be added <br>here that many respondents in our study noted <br>the need for careful consideration of licensing <br>terms and conditions, thereby indirectly indicating <br>consciousness about the underlying IP issues.<br>Challenges with regard to patent protection<br>Patents can grant exclusive control over inventions, <br>and their application in the realm of software has <br>always been a contentious issue, particularly in the <br>context of FOSS principles. 70 Richard Stallman <br>equates software patents to land mines, where <br>each design decision risks legal repercussions. 71 <br>Open source proponents seek to revisit patent <br>jurisprudence in the context of software programs <br>altogether, citing it as an ‚Äòundeserved reward‚Äô. 72 <br>They point to issues such as the highly collaborative 60<br>and incremental character of innovations in the <br>area of software, the challenges associated with <br>prolonged patent terms, and the high potential for <br>abuse in the patent system.73<br>The open source community has developed some <br>strategies to mitigate some of these challenges. 74 <br>They include integrating patent clauses in licenses, <br>open patent movement, patent promises,75 creation <br>of patent pools, 76 creating prior art databases, 77 <br>promotion of rigorous prior art examination and <br>defensive publication strategies.78<br>However, patents still pose considerable challenges <br>for FOSS by virtue of factors such as the high cost of <br>patent litigation (making it difficult for most FOSS <br>projects to defend themselves against infringement <br>claims) and the existence of patent trolls who <br>seek patents merely for the purpose of extorting <br>money from others. The large number of patent <br>applications filed in many of the jurisdictions and <br>the manner in which patent applications are often <br>drafted (concealing the fact that the invention in <br>question is a software) poses additional challenges <br>for FOSS community.<br>As highlighted by Shuvam Misra of Remiges,    <br>patents remain a persistent concern for creators of <br>both open and closed-source software, necessitating <br>proactive risk mitigation strategies. The increasing <br>awareness of this threat within the FOSS <br>community has led organisations to offer protection <br>mechanisms. For instance, Red Hat assumes <br>liability for lawsuits related to their offerings, <br>shielding customers from legal consequences by <br>taking responsibility for contesting such cases in <br>court.<br>The open source community has also raised <br>concerns about the long-term impact on innovation, <br>particularly for smaller developers and companies.79 <br>In some instances, entire projects would have <br>to be halted due to a minor infringement claim <br>from the patent-holders of a proprietary program. <br>While cross-licensing (mutual exchange of patent <br>licenses rights between two or more parties) is a <br>mitigation tactic, it is understood to mostly benefit <br>large corporations as very few FOSS projects have <br>patents to trade.80<br>Moreover, empirical research suggests that IPR <br>enforcement actions can negatively impact FOSS <br>projects by decreasing user interest and developer <br>activity.81 For example, in the context of the SCO <br>v. IBM law suit, one study points out that the user <br>interest (measured by project downloads) showed <br>substantial decline following the initiation of the <br>suit. More specifically, after the lawsuit was filed, <br>FOSS projects having a high technology overlap <br>73. Malcolm Bain and P McCoy Smith, ‚ÄòPatents and the Defensive Response‚Äô in Amanda Brock (ed), Open Source Law, Policy and Practice (Oxford University <br>Press 2022) <https://doi.org/10.1093/oso/9780198862345.003.0010> accessed 23 January 2025; Vasudeva (n 66).<br>74. See Bain and McCoy Smith (n 73); Vasudeva (n 66).<br>75. Red Hat <https://www.redhat.com/en/about/patent-promise> accessed 3 March 2025. See also: Google; IBM, Microsoft. <br>76. For instance, Open Invention Network and License on Transfer Network.<br>77. Projects like Open Source as Prior Art and Peer to Patent.<br>78. Platforms like Technical Disclosure Commons.<br>79. Vasudeva (n 66).<br>80. Bain and McCoy Smith (n 73). Vasudeva (n 66).<br>81. Wen Wen, Chris Forman and Stuart JH Graham, ‚ÄòResearch Note: The Impact of Intellectual Property Rights Enforcement on Open Source Software <br>Project Success‚Äô (2013) 24 Information Systems Research 1131.61<br>with the concerned software witnessed around 15-<br>16% decline in monthly downloads, as compared <br>to a control group. Data also indicates a substantial <br>decline in developer activity and illustrates that in <br>FOSS projects with high technology overlap with <br>the concerned software, FOSS projects experienced <br>a 45% decrease in developer activity in comparison <br>to a control group.82<br>However, as indicated earlier, during our <br>interactions with different organisations as part of <br>this study, most organisations did not report facing <br>any direct software patent-related challenges. Only<br>one organisation reported a challenge they faced <br>relating to a cryptographic library patent. It needs <br>to be added that some organisations expressed <br>concerns about patents being a persistent issue, <br>emphasising the need for vigilance. In other words, <br>even if direct negative experiences in this area are <br>limited in the case of Indian organisations (probably <br>due to the restrictions on software patents under <br>the Patents Act, 1970 of India), they may still be <br>causing a chilling effect in the area.<br>Copyright Law<br>Copyright law plays the most important role <br>in the context of software. It treats software as a <br>‚Äòliterary work‚Äô, protected under Indian copyright <br>law. Some of the open movements like Creative <br>Commons have used the copyright framework <br>in creative ways to ensure wider dissemination of <br>software and other subject matters covered under <br>copyright law. As copyright protection is automatic <br>(no registration is required for getting protection) <br>in most jurisdictions, they achieve the objective of <br>broader dissemination of such works by providing <br>easy-to-use and easy-to-understand licenses that <br>allow a broad range of activities. In some instances, <br>this could mean relinquishing the entire copyright, <br>and in most instances, the developers would <br>only retain the specific right they wish to retain. <br>Attribution is one of such rights.<br>However, due to the fact that there are different <br>licensing options currently available, there is also <br>considerable divergence on licensing terms, despite <br>agreement on the OSS philosophy. To reduce <br>conflict and promote the growth of OSS, groups <br>like the OSI 83 and the FSF 84 have set definitional <br>standards for these licenses. The different licenses <br>can be classified in various ways, such as in terms of <br>control (academic vs permissive vs partially closable <br>vs reciprocal licenses) 85 and in terms of their <br>functional differences (permissive vs restrictive vs <br>highly restrictive licenses).86 <br>Among the different open source license options, it <br>is observed that the two licenses 87 most often used <br>by developers (based on the number of unique <br>pushes to GitHub) in India for 2024 (Q3) are <br>MIT88 and Apache-2.0.89 <br>82. ibid.<br>83. ‚ÄòLicenses‚Äô (Open Source Initiative) <https://opensource.org/licenses> accessed 28 December 2024.<br>84. ‚ÄòFSF Licensing & Compliance Team‚Äô (Free Software Foundation) <https://www.fsf.org/licensing/> accessed 26 January 2025.<br>85. In terms of control over the software. See Van Lindberg, Intellectual Property and Open Source (O‚ÄôReilly 2008).<br>86. Maryna Manteghi, ‚ÄòUnderstanding Open Source and Free Software Licensing Mechanism: A Close Review of the Alternative Approach to Traditional <br>Notions of Software Licensing‚Äô (2017) SSRN Electronic Journal <https://www.ssrn.com/abstract=3082313> accessed 12 December 2024.<br>87.  ‚ÄòIN | GitHub Innovation Graph‚Äô <https://innovationgraph.github.com/economies/in#git-pushes> accessed 3 March 2025; ‚ÄòInnovationgraph/Data/Licenses.<br>Csv at Main ¬∑ Github/Innovationgraph‚Äô (GitHub) <https://github.com/github/innovationgraph/blob/main/data/licenses.csv> accessed 3 March 2025.<br>88.  ‚ÄòMIT License‚Äô <https://mit-license.org/> accessed 31 January 2025. <br>89.  ‚ÄòApache License, Version 2.0‚Äô <https://www.apache.org/licenses/LICENSE-2.0> accessed 31 January 2025. 62<br>Table 4. Popular Open Source Licenses by Nature of Rights Granted<br>LICENSE MIT<br>APACHE <br>LICENSE <br>2.0     <br>(AL 2.0)<br>MOZILLA <br>PUBLIC <br>LICENSE <br>2.0    <br>(MPL 2.0)<br>GNU    <br>LESSER <br>GENERAL <br>PUBLIC   <br>LICENSE <br>v3.0 <br>(LGPL)<br>GNU <br>GENERAL <br>PUBLIC <br>LICENSE <br>v3.0  <br>(GPL)<br>GNU <br>AFFERO <br>GENERAL <br>PUBLIC <br>LICENSE <br>v3.0 <br>(AGPL)<br>PERMISSIONS<br>Commercial use for <br>licensed material and <br>derivatives<br>                                             <br>Distribution                                              <br>Modification                                              <br>Private use                                              <br>Express grant of Patent <br>rights from contributor <br>to recipient<br>                                         <br>CONDITIONS<br>Disclosure of source <br>code when distributing <br>the software<br>                                   <br>Copy of license and <br>copyright notice                                               <br>90. GitHub, Inc. ‚ÄòLicenses‚Äô (Choose a License) <https://choosealicense.com/licenses/> accessed 23 January 2025.<br>91. Carnegie Mellon University CTTEC, ‚ÄòOpen Source License Comparison Grid‚Äô <https://www.cmu.edu/cttec/forms/opensourcelicensegridv1.pdf> accessed 3 <br>March 2025.<br>Table 4 compares some of the popular open source <br>licenses, in terms of the nature of rights granted <br>to the users. It uses the information provided in <br>the Choose A License Appendix 90 and the Open <br>Comparison Grid, released by the Center for <br>Technology Transfer and Enterprise Creation, <br>Carnegie Mellon University.9163<br>    MIT    AL 2.0         MPL 2.0       LGPL        GPL       AGPL<br>Users who interact with <br>the software via network <br>are given the right to <br>receive a copy of the cor-<br>responding source code<br>         <br>Modification to be <br>released under the same <br>license;<br>in some cases similar or <br>related licenses may be <br>used<br>(Modifica-<br>tion of files)<br>(This condition <br>may not apply <br>to works that <br>use the licensed <br>material as a <br>library)<br>       <br>Indicate changes made <br>to the code<br>LIMITATIONS<br>Software without war-<br>ranty, and no liability <br>for damages<br>                                             <br>Explicitly states no <br>grant of trademark <br>rights<br>              <br>However, using licenses is not without its <br>challenges. Some of the specific challenges in the <br>FOSS context are:92<br>1. License proliferation (excessive number of open <br>source licenses) leading to compliance issues, <br>confusion and incompatibility between licenses <br>hindering collaboration and code reuse.93<br>2. The possibility of different jurisdictions <br>interpreting licenses inconsistently, leading to <br>compliance challenges.<br>3. Enforcement costs, particularly for smaller <br>organisations.<br>‚ÄòThe more liberal the licensing, the easier the choice.‚Äô<br>Abhishek Jain, CPTO, Swasth<br>92. See Vasudeva (n 66); Amanda Brock (ed), Open Source Law, Policy and Practice (2nd edn, Oxford University Press 2022) <https://academic.oup.com/<br>book/44727> accessed 22 January 2025; Lindberg (n 85) ch 7; Noam Shemtov and Ian Walden (eds), Free and Open Source Software: Policy, Law, and <br>Practice (1st edn, Oxford University Press 2013).<br>93. Robert Gomulkiewicz, ‚ÄòOpen Source License Proliferation: Helpful Diversity or Hopeless Confusion?‚Äô (2009) 30 Washington University Journal of Law & <br>Policy 261.64<br>Our case studies also show a strong preference <br>for permissive licenses like Apache and MIT. It <br>needs to be added that many of the organisations <br>explicitly expressed that they avoid GPL licenses, <br>citing concerns about its incompatibility with their <br>licenses and its restrictive licensing conditions. <br>One organisation clarified by adding that the <br>viral nature of licenses such as the GNU GPL is a <br>challenge for their FOSS adoption. <br>Additionally, out of the 12 organisations in our <br>sample who released the software they developed as <br>open source, we observe that there is roughly 50:50 <br>split between permissive and restrictive licensing.94  <br>‚ÄòSoftware should be open so that others  <br>can build on top of what I have done,  <br>and I can build on what others have done.‚Äô<br>Shuvam Misra, Founder-Chairman, Remiges<br>During our interactions, it was observed that <br>licensing conditions play a role in the decision-<br>making process and six organisations reported <br>instances wherein licensing conditions led them <br>to reject an FOSS component. While five of them <br>mentioned restrictive licensing conditions as the <br>reason, one organisation reported ambiguity in <br>license terms, which can lead to increased overhead, <br>as a legal team is required to provide clarity.<br>It may also be highlighted here that one healthcare <br>organisation emphasised prioritising quality and <br>suitability of the solution for patient care over a sub-<br>optimal solution to avoid licensing fees, indicating <br>an instance wherein operational priorities outweigh <br>licensing concerns.<br>94. While some of them mentioned the licenses used, for others, the same has been taken from their individual projects as released on GitHub.<br>95. John Walsh, ‚ÄòWhat‚Äôs Driving Changes in Open Source Licensing?‚Äô (DevOps.com, 8 March 2024) <https://devops.com/whats-driving-changes-in-open-<br>source-licensing/> accessed 26 January 2025.<br>Shuvam Misra of Remiges mentioned a notable <br>example of the effect of viral license, MySQL‚Äôs <br>transition of its client libraries from LGPL to GPL <br>following its acquisition by Oracle. <br>With this change, Oracle database client libraries, <br>which are essential for connecting applications to <br>databases, became subject to GPL terms, requiring <br>developers to release their application as derivative <br>works under GPL. This shift created significant <br>challenges for enterprises relying on MySQL, as <br>it pressured them to purchase commercial support <br>agreements to avoid open sourcing their proprietary <br>software.<br>Distinction between GPL and LGPL: While <br>GPL applies its licensing requirements to all <br>derivative works, LGPL typically applies to shared <br>libraries without extending those requirements <br>to application code. This differentiation has <br>historically made LGPL the preferred licensing for <br>client libraries.<br>Vendor-driven FOSS projects are increasingly <br>facing competition from SaaS providers. It is <br>reported that these organisations engage in free <br>riding i.e., the use of OSS without contributing <br>anything in return. This has forced OSS vendors <br>such as MongoDB, Elastic, and Redis Labs to <br>modify their licenses in such a manner that restrict <br>the use of the software by third parties, or require <br>them to pay fees or share their modifications in <br>turn, thereby making the project less open.9565<br>Most organisations in our study are aware of <br>potential licensing risks that may arise in the <br>future due to changes in licensing of a software <br>project. Seven organisations outlined specific risk <br>mitigation strategies they have adopted, with some <br>employing more than one method. The outlined <br>strategies include issuing advisories and alternative <br>suggestions, adopting older versions, using the <br>latest and best fork of the project, implementing <br>modular architectures, and conducting regular <br>audits using SBOM.<br>In a nutshell, despite minimal direct conflicts <br>with IP and licensing, organisations studied as <br>part of this report exercise consistent caution and <br>adopt risk management practices. Some of them <br>also anticipate a rapid change in licensing-related <br>challenges due to AI-driven code development.<br>‚ÄòAs the world moves to AI-driven  <br>software development, testing or code review  <br>and documentation, intellectual property  <br>[rights] for software [will] become obsolete  <br>in the next three to five years.‚Äô<br>Dilip Asbe, MD, CEO, NPCI<br>Software Stack<br>Eight of the organisations analysed in this report <br>have provided their software stack, stating the <br>extent to which different categories of software are <br>used in their organisation. The details regarding <br>the same can be accessed in Table 5. To maintain <br>confidentiality, we have removed the names of the <br>organisations and sector details.66<br>CATEGORY STARTUP STARTUP<br>ORGANISATION S1 S2<br>Proprietary <br>software FOSS <br>In-house <br>developed <br>software<br>Proprietary <br>software<br>FOSS In-house devel-<br>oped software<br>Operating Systems 30% 70% 0% 0% 100% 0%<br>Web Servers 0% 100% 0% 0% 100% 0%<br>Middleware 50% 50% 0%<br>Cloud Native <br>Software <br>100% 0% 0% 70% 30% 0%<br>Development <br>Framework<br>0% 100% 0% 0% 100% 0%<br>Programming <br>Languages<br>0% 100% 0% 0% 100% 0%<br>Database Manage-<br>ment System<br>100% 0% 0% 0% 100% 0%<br>Data Visualisation<br>Messaging and <br>Queueing<br>Infrastructure <br>Automation<br>50% 50% 0%<br>Observability & <br>Monitoring<br>100% 0% 0% 0% 100% 0%<br>Access Control 100% 0% 0%<br>Networking 100% 0% 0%<br>Table 5. Software Stack Usage Across Organisations67<br>   ORGANISATION S1 S2<br>Proprietary <br>software FOSS <br>In-house <br>developed <br>software<br>Proprietary <br>software<br>FOSS<br>In-house <br>developed <br>software<br>CI/ CD 0% 100% 0% 50% 50% 0%<br>AI/ ML <br>Security T ools 0% 100% 0%<br>ERP <br>CRM 100% 0% 0% 0% 0% 100%<br>CMS 0% 0% 100%<br>Ticketing/ Workflow <br>Management System <br>100% 0% 0% 0% 0% 100%<br>MIS <br>LMS 100% 0% 0%<br>Accounting <br>& Finance <br>100% 0% 0% 100% 0% 0%<br>HR and Payroll 100% 0% 0% 100% 0% 0%<br>Project Management 100% 0% 0% 100% 0% 0%<br>Communication<br>(Email/ Instant <br>Messaging<br>or<br>Email/ Office Suite)<br>100% 0% 0%<br>API Integration 100% 0% 0%68<br>CATEGORY STARTUP STARTUP<br>ORGANISATION S3 S4<br>Proprietary <br>software FOSS <br>In-house <br>developed <br>software<br>Proprietary <br>software<br>FOSS In-house devel-<br>oped software<br>Operating Systems 10% 90% 0% 0% 100% 0%<br>Web Servers 0% 90% 10% 0% 85% 15%<br>Middleware 10% 45% 45% 0% 20% 80%<br>Cloud Native <br>Software <br>80% 0% 20% 0% 30% 70%<br>Development <br>Framework<br>10% 80% 10% 0% 100% 0%<br>Programming <br>Languages<br>0% 100% 0% 0% 100% 0%<br>Database Manage-<br>ment System<br>30% 70% 0% 0% 100% 0%<br>Data Visualisation<br>Messaging and <br>Queueing<br>Infrastructure <br>Automation<br>0% 50% 50% 0% 80% 20%<br>Observability & <br>Monitoring<br>50% 50% 0% 0% 100% 0%<br>Access Control 0% 100% 0% 0% 100% 0%<br>Networking 50% 50% 0% 30% 60% 10%69<br>ORGANISATION S3 S4<br>Proprietary <br>software FOSS <br>In-house <br>developed <br>software<br>Proprietary <br>software<br>FOSS<br>In-house <br>developed <br>software<br>CI/ CD 50% 50% 0% 0% 80% 20%<br>AI/ ML 0% 70% 30% 0% 90% 10%<br>Security T ools 0% 50% 50% 0% 100% 0%<br>ERP 100% 0% 0%<br>CRM 50% 50% 0% 100% 0% 0%<br>CMS 0% 50% 50% 0% 100% 0%<br>Ticketing/ Workflow <br>Management System <br>0% 100% 0% 100% 0% 0%<br>MIS 0% 0% 100% 100% 0% 0%<br>LMS 0% 0% 100%<br>Accounting <br>& Finance <br>100% 0% 0% 100% 0% 0%<br>HR and Payroll 100% 0% 0% 100% 0% 0%<br>Project Management 30% 50% 20% 0% 100% 0%<br>Communication<br>(Email/ Instant <br>Messaging<br>or<br>Email/ Office Suite)<br>100% 0% 0% 0% 100% 0%<br>API Integration 70<br>CATEGORY LARGE NON-PROFIT <br>ORGANISATION L1 NP1<br>Proprietary <br>software FOSS <br>In-house <br>developed <br>software<br>Proprietary <br>software<br>FOSS<br>In-house <br>developed <br>software<br>Operating Systems 1% 99% 0% 0% 100% 0%<br>Web Servers 0% 100% 0% 0% 100% 0%<br>Middleware 0% 90% 10%<br>Cloud Native <br>Software <br>0% 100% 0% 0% 100% 0%<br>Development Frame-<br>work<br>0% 100% 0% 0% 100% 0%<br>Programming <br>Languages<br>0% 100% 0% 0% 100% 0%<br>Database Management <br>System<br>0% 100% 0% 0% 100% 0%<br>Data Visualisation 0% 100% 0%<br>Messaging and Queue-<br>ing<br>0% 100% 0%<br>Infrastructure <br>Automation<br>0% 100% 0% 0% 100% 0%<br>Observability & Mon-<br>itoring<br>0% 100% 0% 0% 100% 0%<br>Access Control 0% 100% 0%<br>Networking 0% 100% 0%71<br>    ORGANISATION L1 NP1<br>Proprietary <br>software FOSS<br>In-house <br>developed <br>software<br>Proprietary<br> software<br>FOSS<br>In-house <br>developed<br> software<br>CI/ CD 0% 100% 0% 100% 0% 0%<br>AI/ ML 0% 100% 0%<br>Security T ools 0% 100% 0%<br>ERP 0% 100% 0%<br>CRM 0% 50% 50%<br>CMS 0% 100% 0%<br>Ticketing/ Workflow <br>Management System <br>0% 90% 10% 100% 0% 0%<br>MIS <br>LMS <br>Accounting <br>& Finance <br>100% 0% 0%<br>HR and Payroll 100% 0% 0%<br>Project Management 100% 0% 0%<br>Communication<br>(Email/ Instant <br>Messaging<br>or<br>Email/ Office Suite)<br>70% 30% 0% 100% 0% 0%<br>API Integration 72<br>CATEGORY NON-PROFIT MEDIUM<br>ORGANISATION NP2 M1<br>Proprietary<br> software<br>FOSS<br>In-house <br>developed <br>software<br>Proprietary <br>software FOSS<br>In-house <br>developed <br>software<br>Operating Systems 30% 70% 0% 5% 95% 0%<br>Web Servers 5% 95% 0%<br>Middleware 5% 95% 0%<br>Cloud Native <br>Software <br>Development Frame-<br>work<br>0% 100% 0% 5% 95% 0%<br>Programming <br>Languages<br>0% 100% 0% 5% 95% 0%<br>Database Management <br>System<br>20% 80% 0%<br>Data Visualisation 0% 100% 0%<br>Messaging and Queue-<br>ing<br>Infrastructure <br>Automation<br>5% 95% 0%<br>Observability & Mon-<br>itoring<br>20% 80% 0%<br>Access Control 30% 70% 0%<br>Networking 0% 100% 0%73<br> ORGANISTION NP2 M1<br>Proprietary <br>software FOSS<br>In-house <br>developed <br>software<br>Proprietary<br> software<br>FOSS<br>In-house<br>developed<br> software<br>CI/ CD 10% 90% 0%<br>AI/ ML 20% 80% 0%<br>Security T ools 0% 100% 0%<br>ERP 0% 100% 0%<br>CRM 0% 100% 0% 100% 0% 0%<br>CMS 0% 100% 0%<br>Ticketing/ Workflow <br>Management System <br>0% 100% 0% 0% 100% 0%<br>MIS 100% 0% 0%<br>LMS <br>Accounting <br>& Finance <br>100% 0% 0% 100% 0% 0%<br>HR and Payroll 100% 0% 0%<br>Project Management 0% 100% 0% 0% 100% 0%<br>Communication<br>(Email/ Instant <br>Messaging<br>or<br>Email/ Office Suite)<br>100% 0% 0% 90% 10% 0%<br>API Integration <br>Data was not provided by the organisation/ The stack is not used by the organisation74<br>FOSS Policies and Industry <br>Expectations<br>The union government and various state <br>governments in India have formulated and <br>implemented policies to encourage the development <br>and adoption of FOSS. These can be divided into

</details>

| Mode | Response |
|------|----------|
| **üîç RAG (Context Grounded)** | HR & payroll is entirely handled with proprietary software‚Äîi.e., 100 % of that stack is proprietary. |
| **üåê Common LLM (No Context)** | An example of an HR-and-payroll package that falls in the ‚Äúproprietary software‚Äù category is greytHR (from Greytip Software). |
