<a href="https://colab.research.google.com/github/ampazio/hallow-universe/blob/main/Creating_nodes_edges.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**First step: lexical analysis in MAXQDA**.

Open MAXDiction > Dictionary. Add categories. One category = one code (no spaces between multi-word entity names). To each caterogy we add word variations that the dictionary should recognize, for example: if our category is JonathanRoumie, we include "Jonathan Roumie" as position.

We then conduct a content analysis based on the dictionary and code our documents automatically with the results.

**Second step: Co-occurence analysis in MAXQDA**. We have to make sure that each code is made up of just one word (no gaps).

Activate all documents and all codes.

Go to the "Analysis" option and choose "Complex Coding Query". In the window that opens, under "Function" choose "Intersection (set)",
make sure all codes are activated and set the minimum code number to 2.

Then in the "Retrieved segments" section in the MAXQDA bottom panel choose "Export as Excel file". Choose "other codes ascribed to segment". Save the file.

Open the file in Excel and save it in .csv UTF-8 format.

Upload this .csv file to COLAB and insert it into the file's path into the script below.

In [None]:
import pandas as pd

# Import the CSV file
file_path = '/content/MAXQDA 2_KODY.csv'  # Change to your file name
data = pd.read_csv(file_path, sep=';', encoding='utf-8')

# Change the name of column B
data.rename(columns={'Inne kody przypisane do segmentu': 'kod'}, inplace=True)

# Clean data in the 'kod' column
if 'kod' in data.columns:
    data['kod'] = data['kod'].astype(str).str.replace(r'\(Waga: 0\)', '', regex=True)  # Remove phrases "(Waga: 0)"
    data['kod'] = data['kod'].str.replace('"', '')  # Usuń zbędne cudzysłowy
    data['kod'] = data['kod'].str.split().apply(lambda x: ';'.join(filter(None, x)))  # Combine remaining words with semi-colons

# Save the processed file
output_path = 'przetworzony_plik.csv'  # Path to file
data.to_csv(output_path, sep=';', index=False, encoding='utf-8')

print(f"Plik został przetworzony i zapisany jako {output_path}")



Plik został przetworzony i zapisany jako przetworzony_plik.csv


Our file has been cleaned, we can insert it into the script below.

In [None]:
import pandas as pd

# Load data, using semi-colons as delimiters
data = pd.read_csv('/content/przetworzony_plik.csv', delimiter=';')

# Verity column names
print(data.columns)

# Remove superfluous spaces in column names
data.columns = data.columns.str.strip()

# Check whether 'kod' includes empty values
data['kod'] = data['kod'].fillna('')  # Replces empty values with empty strings

# Separate the "kod" kolumn into separate codes (now ID lists)
data['kod'] = data['kod'].str.split(';')

# Create list of all unique codes (actors from the 'kod' column)
all_codes = set()
for codes in data['kod']:
    all_codes.update(codes)

# Ascribe unique ID numbers to each code
code_to_id = {code: idx + 1 for idx, code in enumerate(all_codes)}

# Replaces codes with their IDs in the 'kod' column
data['kod'] = data['kod'].apply(lambda x: [code_to_id[code] for code in x])

# Creating a list of edges (relationships between codes within the same segment)
edges = []

# For each data row
for idx, row in data.iterrows():
    codes = row['kod']  # List of codes ascribed to segment

    # Create unique code pairs within each segment
    for i in range(len(codes)):
        for j in range(i + 1, len(codes)):
            edges.append([codes[i], codes[j], 'Undirected', 1])

# Create DataFrame for edges
edges_df = pd.DataFrame(edges, columns=['Source', 'Target', 'Type', 'Weight'])

# Create DataFrame for nodes (code_id and code_name)
nodes = pd.DataFrame({
    'Id': [code_to_id[code] for code in all_codes],
    'Label': [code for code in all_codes]
})

# Save CSV files
nodes.to_csv('nodes.csv', index=False, sep=';')  # Węzły (kody)
edges_df.to_csv('edges.csv', index=False, sep=';')  # Krawędzie (relacje między kodami)

print("Pliki nodes.csv i edges.csv zostały wygenerowane.")


Index(['Segment', 'kod'], dtype='object')
Pliki nodes.csv i edges.csv zostały wygenerowane.


Import both files into Gephi.
Visualize the relationships between the nodes using available algorithms, such as ForceAtlas2. The size of the nodes can be regulated so that it reflects the degree centrality of the nodes. Communities can be identified via the use of modularity statistics.