# Classification by LLM Model
- Classify Papers into 10 Categories Using OpenAI GPT Model("gpt-4.1-mini")  
- 10 Categories object
  - 1: AI- and Machine Learning Applications in IS  
  - 2: IS Security, Privacy, and Technology Adoption  
  - 3: Social Media and Online Community Behavior  
  - 4: Digital Transformation and IT Strategy in Organizations  
  - 5: Data Analytics and IS Performance  
  - 6: HCI, UX, and Interface Design in IS  
  - 7: ICT for Digital Inclusion and Social Equity  
  - 8: Platform Business Models and Digital Markets  
  - 9: Green IS, CSR, and Sustainability  
  - 10: IS Theories, Methods, and Meta-Research
  
> Used Libraries: Pandas, Langchain

In [75]:
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnableParallel

import pandas as pd
from IPython.display import display_markdown

# CONFIG
import dotenv
import os
dotenv.load_dotenv()
os.environ["LANGSMITH_PROJECT"] = 'JAIS Paper Theme Classification Using LLM'

# models
from langchain_openai import ChatOpenAI
from langchain_google_genai import ChatGoogleGenerativeAI

True

In [35]:
# Read the CSV file
papers_df = pd.read_csv('JAIS_papers.csv', encoding='utf-8-sig')
papers_df.head()

Unnamed: 0,title,abstract,citation,url,year,volume,issue
0,Explaining Persistent Ineffectiveness in Profe...,Abstract\nOnline communities (OCs) have become...,"Recommended Citation\n\n\n Stein, Mari-Klar...",https://aisel.aisnet.org/jais//vol23/iss1/1/,2022,23,1
1,A Design Theory for Energy and Carbon Manageme...,Abstract\nEnergy and carbon management systems...,"Recommended Citation\n\n\n Zampou, Eleni; M...",https://aisel.aisnet.org/jais//vol23/iss1/2/,2022,23,1
2,Examining the Impacts of Airbnb Review Policy ...,"Abstract\nIn July 2014, Airbnb, one of the big...","Recommended Citation\n\n\n Mousavi, Reza an...",https://aisel.aisnet.org/jais//vol23/iss1/3/,2022,23,1
3,Inventing Together: The Role of Actor Goals an...,Abstract\nWith the ubiquity of the internet an...,"Recommended Citation\n\n\n Abhari, Kaveh; D...",https://aisel.aisnet.org/jais//vol23/iss1/4/,2022,23,1
4,The Design of a System for Online Psychosocial...,Abstract\nThe design of sensitive online healt...,"Recommended Citation\n\n\n Sjöström, Jonas;...",https://aisel.aisnet.org/jais//vol23/iss1/5/,2022,23,1


## Prompt Template

In [73]:
system_prompt = """
You are a classification assistant trained to identify the main research theme of academic papers published in the *Journal of the Association for Information Systems (JAIS)*.

You will be given the **title** and **abstract** of an IS (Information Systems) research paper.

Your task is to **assign exactly one of the following ten categories** that best represents the paper’s core theme. Each category is defined by a **specific scope of research problems, methods, or technological domain**.

## Category List

1. **AI- and Machine Learning-Driven Information System Design and Organizational Applications**  
   Research on how AI/ML technologies such as predictive models, algorithmic decision-making, or Generative AI are designed, deployed, or governed in IS contexts (e.g., business, healthcare, labor).

2. **IS Research on Information Security Behavior, Privacy Concerns, and Technology Adoption**  
   Covers behavioral theories (e.g., PMT, TPB), user responses to security policies, adoption of privacy-protecting technologies, and the paradox between personalization and privacy.

3. **User Behavior, Information Diffusion, and Community Dynamics in Social Media Platforms**  
   Studies related to user interactions, content sharing, social influence, misinformation, or platform governance (e.g., trolling, moderation, Q&A communities).

4. **Organizational-Level IS Strategy Research: Digital Transformation, IT Ambidexterity, and CIO Governance**  
   Research on digital innovation strategy, IT resource alignment, dynamic capabilities, and executive-level IT management.

5. **Data Analytics-Driven IS Performance and Value Creation for Organizations and Society**  
   Focus on the impact of analytics, big data, or data science on IS success, organizational performance, or public value, including systematic/meta analyses.

6. **Human–Computer Interaction, Interface Design, and User Experience (UX) in IS Research**  
   Includes studies on UI/UX design, cognitive aspects of interface use, and evaluation of digital agents, dashboards, or immersive systems (e.g., VR, fMRI studies).

7. **ICT Design for Digital Inclusion, Social Equity, and Access Among Marginalized Populations**  
   Explores how IS artifacts and ICTs support underserved users, promote fairness, or address digital divides (e.g., in fintech, education, governance).

8. **Digital Platform Business Models, User Engagement Mechanisms, and Market Transaction Design**  
   Focused on platform economics, pricing strategy, user-generated content, crowdfunding, and dual-sided market structure in digital platforms.

9. **Green IS, Corporate Social Responsibility, and Sustainable Information System Practices**  
   IS for sustainability, including carbon tracking systems, CSR impact, and design science contributions toward environmental or social responsibility.

10. **Theory Building, Methodological Innovation, and Meta-Level Scholarship in IS**  
    Includes literature reviews, methodological guidelines, theory synthesis, or frameworks for meta-theoretical advancement in the IS field.

## Output Format

Return a **JSON object** in this exact format:  
`{"category_name": X, 
   "rationale": "..."}`

Where,
- **category_number**: (integer) the number of the selected category (1-10) and the rationale is a brief explanation of why this category was chosen. Where `X` is a number from 1 to 10, 
- **rationale**: (string) a brief classification rationale, **no more than two sentences**, explaining why this category was chosen.

## Input Example

**Title:** Algorithm Sensemaking: How Platform Workers Make Sense of Algorithmic Management  
**Abstract:** This paper investigates how gig workers interpret algorithmic management by applying the sensemaking lens. Through a qualitative study of ride-hailing platforms, we examine how digital control mechanisms shape worker perceptions and strategic responses.

**Output:**  
```json
{
  "category_number": 3,
  "rationale": "The study apory to platform workers’ interpretations of algorithmic management, focusing on user behavior within a digital platform. It examines how control mechanisms influence perceptions and actions. This aligns most closely with social media and online community dynamics."
}
```

## Classification Rule

- Assign **only one** category.
- Choose the category that reflects the **primary theoretical or empirical contribution** of the study.
- If multiple categories are relevant, select the one with the **closest alignment to the core research question**.
- If the input does **not clearly fit any category**, return `{"category": 0}`.

Now classify the following input:

"""

input_prompt = """
# [System]
{system_prompt}

# [Input]
### **Title:** {title}
### **Abstract:** {abstract}
"""

prompt_template = PromptTemplate(
   template=input_prompt,
   input_variables=["system_prompt", "title", "abstract"]
)

prompt_template = prompt_template.partial(system_prompt=system_prompt)

In [37]:
# Prompt Template Testing
# Display the first paper's title and abstract for testing
# Approximately 1100 Tokens by One Paper

prompt = prompt_template.invoke(
    input={
        "title": papers_df.iloc[0]['title'],
        "abstract": papers_df.iloc[0]['abstract']
    }
)

display_markdown(prompt.text, raw=True)


# [System]

You are a classification assistant trained to identify the main research theme of academic papers published in the *Journal of the Association for Information Systems (JAIS)*.

You will be given the **title** and **abstract** of an IS (Information Systems) research paper.

Your task is to **assign exactly one of the following ten categories** that best represents the paper’s core theme. Each category is defined by a **specific scope of research problems, methods, or technological domain**.

## Category List

1. **AI- and Machine Learning-Driven Information System Design and Organizational Applications**  
   Research on how AI/ML technologies such as predictive models, algorithmic decision-making, or Generative AI are designed, deployed, or governed in IS contexts (e.g., business, healthcare, labor).

2. **IS Research on Information Security Behavior, Privacy Concerns, and Technology Adoption**  
   Covers behavioral theories (e.g., PMT, TPB), user responses to security policies, adoption of privacy-protecting technologies, and the paradox between personalization and privacy.

3. **User Behavior, Information Diffusion, and Community Dynamics in Social Media Platforms**  
   Studies related to user interactions, content sharing, social influence, misinformation, or platform governance (e.g., trolling, moderation, Q&A communities).

4. **Organizational-Level IS Strategy Research: Digital Transformation, IT Ambidexterity, and CIO Governance**  
   Research on digital innovation strategy, IT resource alignment, dynamic capabilities, and executive-level IT management.

5. **Data Analytics-Driven IS Performance and Value Creation for Organizations and Society**  
   Focus on the impact of analytics, big data, or data science on IS success, organizational performance, or public value, including systematic/meta analyses.

6. **Human–Computer Interaction, Interface Design, and User Experience (UX) in IS Research**  
   Includes studies on UI/UX design, cognitive aspects of interface use, and evaluation of digital agents, dashboards, or immersive systems (e.g., VR, fMRI studies).

7. **ICT Design for Digital Inclusion, Social Equity, and Access Among Marginalized Populations**  
   Explores how IS artifacts and ICTs support underserved users, promote fairness, or address digital divides (e.g., in fintech, education, governance).

8. **Digital Platform Business Models, User Engagement Mechanisms, and Market Transaction Design**  
   Focused on platform economics, pricing strategy, user-generated content, crowdfunding, and dual-sided market structure in digital platforms.

9. **Green IS, Corporate Social Responsibility, and Sustainable Information System Practices**  
   IS for sustainability, including carbon tracking systems, CSR impact, and design science contributions toward environmental or social responsibility.

10. **Theory Building, Methodological Innovation, and Meta-Level Scholarship in IS**  
    Includes literature reviews, methodological guidelines, theory synthesis, or frameworks for meta-theoretical advancement in the IS field.

## Output Format

Return a **JSON object** in this exact format:  
`{"category_name": X, 
   "rationale": "..."}`

Where,
- **category_number**: (integer) the number of the selected category (1-10) and the rationale is a brief explanation of why this category was chosen. Where `X` is a number from 1 to 10, 
- **rationale**: (string) a brief classification rationale, **no more than three sentences**, explaining why this category was chosen.

## Input Example

**Title:** Algorithm Sensemaking: How Platform Workers Make Sense of Algorithmic Management  
**Abstract:** This paper investigates how gig workers interpret algorithmic management by applying the sensemaking lens. Through a qualitative study of ride-hailing platforms, we examine how digital control mechanisms shape worker perceptions and strategic responses.

**Output:**  
```json
{
  "category_number": 3,
  "rationale": "The study apory to platform workers’ interpretations of algorithmic management, focusing on user behavior within a digital platform. It examines how control mechanisms influence perceptions and actions. This aligns most closely with social media and online community dynamics."
}
```

## Classification Rule

- Assign **only one** category.
- Choose the category that reflects the **primary theoretical or empirical contribution** of the study.
- If multiple categories are relevant, select the one with the **closest alignment to the core research question**.
- If the input does **not clearly fit any category**, return `{"category": 0}`.

Now classify the following input:



# [Input]
### **Title:** Explaining Persistent Ineffectiveness in Professional Online Communities: Multilevel Tensions and Misguided Coping Strategies
### **Abstract:** Abstract
Online communities (OCs) have become an increasingly prevalent way for organizations to bring people together to collaborate and create value. However, despite the abundance of extant literature, many studies still point to the lack of long-term sustainability of OCs. We contend that communities become dormant or obsolete over time because of manifestations of ineffectiveness a state of the community that hinders the attainment of individual and collective desired outcomes. While ineffectiveness in OCs is common, it is less apparent why such ineffectiveness persists. Two knowledge gaps are particularly significant here. First, while the multilevel nature of OCs is acknowledged, corresponding difficulties in aligning individual and collective interests and behaviors have often been neglected in past studies. Second, rare longitudinal studies have revealed that community members respond to ineffectiveness with various coping behaviors. However, the impact of these coping behaviors may not turn out as desired. Consequently, we investigate the persistence of ineffectiveness from the perspective of multilevel and coping effects, addressing the following research question: How and why does ineffectiveness persist in online communities? Our critical realist case study offers a three-step explanatory framework: (1) underlying multilevel tensions in the community contribute to usage ineffectiveness (i.e., members are unable to use the OC effectively); (2) misguided coping behaviors contribute to ineffective adaptation (i.e., members are unable to cope with not being able to use the OC effectively); and (3) ineffectiveness persists due to the interaction between usage and adaptation ineffectiveness.


## Model

In [81]:
model_gemini = ChatGoogleGenerativeAI(
    model="gemini-2.5-flash-preview-04-17",
    temperature=0.0
)

model_gpt41 = ChatOpenAI(
    model="gpt-4.1-2025-04-14",
    temperature=0.0
)

model_gpt41mini = ChatOpenAI(
    model="gpt-4.1-mini-2025-04-14",
    temperature=0.0
)

model_gpt04mini = ChatOpenAI(
    model="o4-mini-2025-04-16",
)

In [None]:
# Model testing

# 프롬프트 포맷팅
formatted_prompt = prompt_template.format(
    title=papers_df.iloc[0]['title'],
    abstract=papers_df.iloc[0]['abstract']
)

# 모델에 직접 전달
model_response = model_gpt41mini.invoke(formatted_prompt)
print(model_response.text)

<bound method BaseMessage.text of AIMessage(content='{"category": 3}', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 7, 'prompt_tokens': 1144, 'total_tokens': 1151, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 1024}}, 'model_name': 'gpt-4.1-mini-2025-04-14', 'system_fingerprint': 'fp_38647f5e19', 'id': 'chatcmpl-BT4jlXd15B1j6uqD2U6u0G8IWWpvz', 'finish_reason': 'stop', 'logprobs': None}, id='run-8f8609ee-c4a4-4f94-85c9-5a6b0821ed92-0', usage_metadata={'input_tokens': 1144, 'output_tokens': 7, 'total_tokens': 1151, 'input_token_details': {'audio': 0, 'cache_read': 1024}, 'output_token_details': {'audio': 0, 'reasoning': 0}})>


## Output Parser

In [39]:
from langchain_core.output_parsers import JsonOutputParser

OutputParser = JsonOutputParser()

In [54]:
from langchain_core.runnables import RunnableLambda

Formatter = RunnableLambda(lambda x: {
    "category_number": x["category_number"],
    "rationale": x["rationale"]
})


## Chain

In [None]:
Classifier_Chain = (
    prompt_template
    | RunnableParallel({
        "category_num_gemini" : model_gemini | OutputParser | Formatter, 
        "category_num_gpt41" : model_gpt41 | OutputParser | Formatter,
        "category_num_gpt41mini" : model_gpt41mini | OutputParser | Formatter,
        "category_num_gpto4mini" : model_gpt04mini | OutputParser | Formatter
    })
)

In [None]:
# # test
# test_input = {
#     "title": "AI 기반 의료 진단 시스템의 윤리적 고려사항",
#     "abstract": "이 논문은 인공지능을 활용한 의료 진단 시스템의 도입이 가져올 윤리적, 법적 문제를 분석한다. 특히 환자 개인정보 보호, 알고리즘 편향성, 설명가능성 등에 대해 논의한다."
# }

# result = Classifier_Chain.invoke(test_input)
# print(result)

{'category_num_gemini': {'category_number': 1, 'rationale': 'The paper analyzes ethical and legal issues, including privacy, bias, and explainability, arising from the introduction of AI-based medical diagnosis systems. This aligns with research on the governance and deployment of AI/ML technologies in an IS context like healthcare.'}, 'category_num_gpt41': {'category_number': 1, 'rationale': 'The paper analyzes ethical and legal issues related to the adoption of AI-based medical diagnostic systems, focusing on aspects like privacy, bias, and explainability. This aligns with research on AI/ML-driven IS design and their organizational applications, particularly in healthcare.'}, 'category_num_gpt41mini': {'category_number': 1, 'rationale': 'The paper focuses on AI-based medical diagnostic systems, addressing ethical and legal issues related to AI deployment such as algorithmic bias and explainability. This aligns with research on AI/ML-driven IS design and organizational applications.'}

In [None]:
# # 1. 입력 변환
# records = papers_df[["title", "abstract"]].to_dict(orient="records")

# # 2. 체인 실행
# results = Classifier_Chain.batch(records, config={"max_concurrency": 1})

# # 3. 모델별 결과 펼치기
# flattened_results = []
# for row in results:
#     flat_row = {}
#     for model_key, model_result in row.items():
#         for subkey, value in model_result.items():
#             flat_row[f"{subkey}_{model_key}"] = value
#     flattened_results.append(flat_row)

# # 4. DataFrame으로 변환 및 결합
# df_result = pd.DataFrame(flattened_results)
# papers_df = pd.concat([papers_df, df_result], axis=1)

In [89]:
import pandas as pd
from tqdm import tqdm  # 진행률 표시

# 1. 입력 리스트 생성
records = papers_df[["title", "abstract"]].to_dict(orient="records")

# 2. 각 record에 대해 순차적으로 체인 실행 + 진행률 표시
flattened_results = []

for record in tqdm(records, desc="Classifying Papers"):
    try:
        # 개별 실행
        result = Classifier_Chain.invoke(record, config={"run_name": "Classifier_Chain"})

        # 결과 펼치기
        flat_row = {}
        for model_key, model_result in result.items():
            for subkey, value in model_result.items():
                flat_row[f"{subkey}_{model_key}"] = value

        flattened_results.append(flat_row)

    except Exception as e:
        print(f"\nError on input: {record.get('title', '[No Title]')}")
        print(f"  → {e}")
        flattened_results.append({})  # 빈 결과로 채움

# 3. 결과 DataFrame 생성 및 결합
df_result = pd.DataFrame(flattened_results)
papers_df = pd.concat([papers_df, df_result], axis=1)


Classifying Papers: 100%|██████████| 170/170 [22:02<00:00,  7.78s/it]


In [90]:
papers_df.head(5)

Unnamed: 0,title,abstract,citation,url,year,volume,issue,category_name,rationale,category_number_category_num_gemini,rationale_category_num_gemini,category_number_category_num_gpt41,rationale_category_num_gpt41,category_number_category_num_gpt41mini,rationale_category_num_gpt41mini,category_number_category_num_gpt4omini,rationale_category_num_gpt4omini
0,Explaining Persistent Ineffectiveness in Profe...,Abstract\nOnline communities (OCs) have become...,"Recommended Citation\n\n\n Stein, Mari-Klar...",https://aisel.aisnet.org/jais//vol23/iss1/1/,2022,23,1,3,The paper focuses on the dynamics within profe...,3,The paper investigates user behavior (coping s...,3,"The paper investigates user behavior, coping s...",3,"The paper focuses on user behavior, community ...",3,The paper examines how multilevel tensions and...
1,A Design Theory for Energy and Carbon Manageme...,Abstract\nEnergy and carbon management systems...,"Recommended Citation\n\n\n Zampou, Eleni; M...",https://aisel.aisnet.org/jais//vol23/iss1/2/,2022,23,1,9,The paper focuses on designing energy and carb...,9,The paper focuses on the design and theory dev...,9,The paper develops a design theory for energy ...,9,The paper develops a design theory for energy ...,9,This paper develops a design science theory fo...
2,Examining the Impacts of Airbnb Review Policy ...,"Abstract\nIn July 2014, Airbnb, one of the big...","Recommended Citation\n\n\n Mousavi, Reza an...",https://aisel.aisnet.org/jais//vol23/iss1/3/,2022,23,1,3,The paper studies user-generated content and b...,8,The study examines how a policy change on a di...,8,The paper investigates the effects of a platfo...,3,The paper studies how a platform policy change...,8,The paper empirically evaluates a platform gov...
3,Inventing Together: The Role of Actor Goals an...,Abstract\nWith the ubiquity of the internet an...,"Recommended Citation\n\n\n Abhari, Kaveh; D...",https://aisel.aisnet.org/jais//vol23/iss1/4/,2022,23,1,8,The paper focuses on user engagement and parti...,8,The study investigates individual participatio...,8,The paper focuses on user engagement mechanism...,8,The paper focuses on user participation and be...,8,The study examines how platform affordances an...
4,The Design of a System for Online Psychosocial...,Abstract\nThe design of sensitive online healt...,"Recommended Citation\n\n\n Sjöström, Jonas;...",https://aisel.aisnet.org/jais//vol23/iss1/5/,2022,23,1,2,The paper focuses on designing an online healt...,2,The paper focuses on designing an information ...,2,The paper centers on balancing privacy and acc...,2,The paper focuses on designing an online healt...,2,This design science study focuses on embedding...


In [92]:
papers_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 170 entries, 0 to 169
Data columns (total 15 columns):
 #   Column                                  Non-Null Count  Dtype 
---  ------                                  --------------  ----- 
 0   title                                   170 non-null    object
 1   abstract                                170 non-null    object
 2   citation                                170 non-null    object
 3   url                                     170 non-null    object
 4   year                                    170 non-null    int64 
 5   volume                                  170 non-null    int64 
 6   issue                                   170 non-null    object
 7   category_number_category_num_gemini     170 non-null    int64 
 8   rationale_category_num_gemini           170 non-null    object
 9   category_number_category_num_gpt41      170 non-null    int64 
 10  rationale_category_num_gpt41            170 non-null    object
 11  catego

In [93]:
papers_df.to_csv('JAIS_papers_classified.csv', index=False, encoding='utf-8-sig')

# Validation by other models

In [None]:
category_cols = [
    "category_number_category_num_gemini",
    "category_number_category_num_gpt41",
    "category_number_category_num_gpt41mini",
    "category_number_category_num_gpto4mini",
]
# 각 행의 category 값 집합을 비교 → 길이가 1이면 만장일치
disagreed_df = papers_df[papers_df[category_cols].nunique(axis=1) > 2]
print(f"💡 총 {len(disagreed_df)}개의 논문이 모델 간 분류 불일치입니다.")
disagreed_df[[*category_cols, "title"]].head(10)


💡 총 9개의 논문이 모델 간 분류 불일치입니다.


Unnamed: 0,category_number_category_num_gemini,category_number_category_num_gpt41,category_number_category_num_gpt41mini,category_number_category_num_gpt4omini,title
7,1,5,5,10,In the Backrooms of Data Science
38,6,3,4,4,Network Configuration in App Design: The Effec...
54,2,2,7,9,On the Effectiveness of Smart Metering Technol...
67,3,4,4,10,Contextualizing Team Adaptation for Fostering ...
78,10,8,4,10,Design Principles for Platform-Enabled Knowled...
88,7,4,4,10,Design Theory for Societal Digital Transformat...
100,10,3,3,8,Qualitative Cusp Catastrophe Multi-Agent Simul...
108,5,8,4,8,The Impact of Feature Exploitation and Explora...
153,10,8,3,8,When Everyone Is Visible No One Is: Qualifica...


In [99]:
from collections import Counter

# 1. 분류 결과 컬럼명
category_cols = [
    "category_number_category_num_gemini",
    "category_number_category_num_gpt41",
    "category_number_category_num_gpt41mini",
    "category_number_category_num_gpt4omini",
]

# 2. 다수결 함수 정의
def majority_vote(row):
    votes = [row[col] for col in category_cols]
    vote_counts = Counter(votes)
    return vote_counts.most_common(1)[0][0]  # 가장 많이 나온 label 반환

# 3. 새 column 추가
papers_df["final_label_majority"] = papers_df.apply(majority_vote, axis=1)


In [101]:
papers_df.to_csv('JAIS_papers_classified.csv', index=False, encoding='utf-8-sig')