<a href="https://colab.research.google.com/github/AbdoAnss/NLP-for-Network-Analysis/blob/main/notebooks/data_cleaning_and_manipulation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### Data Manipulation and Cleaning

The scraped data was manipulated and cleaned to prepare it for analysis and visualization. The following steps were taken:

1. **Data loading**: The scraped data was loaded into a Pandas DataFrame for further manipulation and cleaning.
2. **Data inspection**: The DataFrame was inspected to identify any missing values, duplicate rows, or inconsistent data types.
3. **Data cleaning**: The DataFrame was cleaned to remove any missing values, duplicate rows, or inconsistent data types. This involved the following steps:
	* Removing rows with missing values using the `dropna` method.
	* Removing duplicate rows using the `drop_duplicates` method.
	* Converting data types using the `astype` method.
4. **Data transformation**: The DataFrame was transformed to improve its structure and readability. This involved the following steps:
	* Renaming columns using the `rename` method.
	* Reordering columns using the `insert` method.
	* Adding new columns using the `assign` method.
5. **Data enrichment**: The DataFrame was enriched with additional data from external sources. This involved the following steps:
	* Adding a new column called `description` that contains the Wikipedia summary for each company using the `wikipedia` library.
6. **Data export**: The cleaned and transformed DataFrame was exported as a JSON file for further analysis and visualization.

By following these steps, we were able to manipulate and clean the scraped data to prepare it for analysis and visualization. The resulting DataFrame is more structured, consistent, and informative, which will facilitate further analysis and visualization.

In [None]:
import os
import json
import pandas as pd

In [None]:
# List all JSON files in the ./data/ folder
json_files = [f for f in os.listdir('./data/') if f.endswith('.json')]

In [None]:
# Create an empty list to store the data
all_data = []

In [None]:
# Iterate over the JSON files
for json_file in json_files:
    # Load the JSON file
    with open(os.path.join('./data/', json_file), 'r') as f:
        data = json.load(f)

    # Extract the company name
    company = data[0]["Company"]

    # Create a dictionary for the DataFrame
    df_dict = {"Company": [company]}

    # Iterate over the rest of the data and add it to the dictionary
    for item in data[1:]:
        title = item["data"]["title"]
        content = item["data"]["content"]
        if title in df_dict:
            df_dict[title].append(content)
        else:
            df_dict[title] = [content]

    # Add the data to the list
    all_data.append(df_dict)

# Create the DataFrame
df = pd.DataFrame(all_data)


In [None]:
# Replace NaN values with None
df = df.where((pd.notna(df)), None)

In [None]:
df

Unnamed: 0,Company,Company type,Traded as,Industry,Founded,Headquarters,Key people,Products,Revenue,Net income,...,Launched,Links,Availability,Terrestrial,DTT (Morocco),Streaming media,CASSETTE Audio Video Visual,Ziggo GO,Number of locations,Total equity
0,[Maroc Telecom],[Public],[Euronext Paris : IAM],[Telecommunications],"[February 3, 1998 ; 26 years ago ( 1998-02-03 )]","[Rabat , Morocco]","[Abdeslam Ahizoune , Chairman & CEO Laurent Ma...","[Landline phones , Mobile phone lines , Fiber-...","[US$ 3,6 billion (2018)]",[US$ 610 million (2018)],...,,,,,,,,,,
1,[OCP Group],,,"[Phosphates, Chemicals]",[1920],"[Casablanca , Morocco]",[Mostafa Terrab (Chairman)],,[US$11 billion (2022)],[US$1.6 billion dollars (2022)],...,,,,,,,,,,
2,[Inwi],[Private company],,[Telecommunications],[April 2009 ; 14 years ago ( 2009-04 ) (Jointl...,"[Casablanca , Morocco]","[Nadia Fassi Fehri, Chief Executive Officer (2...","[Fixed phones, Mobile phone lines, Digital Tel...",,,...,,,,,,,,,,
3,[Akwa Group S.A.],[Société anonyme],,,[1932 ( 1932 )],"[Casablanca , Morocco]",[Aziz Akhannouch Ali Wakrim Jamal Wakrim],,[$3 billion],,...,,,,,,,,,,
4,[Compagnie Marocaine de Navigation],[Subsidiary],,[Transport],[1946],"[Casablanca , Morocco]",[Taoufik Ibrahimi (PDG)],[Ferries Port services Passenger transportatio...,,,...,,,,,,,,,,
5,[FerriMaroc],,,,[November 1994],"[Nador , Morocco]",,,,,...,,,,,,,,,,
6,[Agma],,,[Finance/Insurance],,"[Casablanca , Morocco]",[Rachida Benabdallah (Managing Director)],,[MAD123.50 million (2013)],,...,,,,,,,,,,
7,[Banque Centrale Populaire],,,[Finance],,"[Casablanca , Morocco]",,[Financial services],"[DH 20,1 billion (2021)]","[DH 2,7 billion (2021)]",...,,,,,,,,,,
8,[Compagnie Générale Immobiliere],[Subsidiary],,[Real estate],[1960 ; 64 years ago ( 1960 )],,,,,,...,,,,,,,,,,
9,[Eco-Médias],[Société Anonyme],,[Media],[1991 ; 33 years ago ( 1991 )],"[Casablanca , Morocco]",[Marie-Thérèse Bourrut Abdelmounaïm Dilami],,,,...,,,,,,,,,,


In [None]:
# Convert each element in the DataFrame to a string
df = df.applymap(str)

# Remove the brackets from each element
df = df.replace({'\[': '', '\]': ''}, regex=True)


In [None]:
df.head()

Unnamed: 0,Company,Company type,Traded as,Industry,Founded,Headquarters,Key people,Products,Revenue,Net income,...,Launched,Links,Availability,Terrestrial,DTT (Morocco),Streaming media,CASSETTE Audio Video Visual,Ziggo GO,Number of locations,Total equity
0,'Maroc Telecom','Public','Euronext Paris : IAM','Telecommunications',"'February\xa03, 1998 ; 26 years ago ( 1998-02-...","'Rabat , Morocco'","'Abdeslam Ahizoune , Chairman & CEO Laurent Ma...","'Landline phones , Mobile phone lines , Fiber-...","'US$ 3,6 billion (2018)'",'US$ 610 million (2018)',...,,,,,,,,,,
1,'OCP Group',,,"'Phosphates, Chemicals'",'1920',"'Casablanca , Morocco'",'Mostafa Terrab (Chairman)',,'US$11 \xa0billion\xa0(2022)','US$1.6 \xa0 billion dollars\xa0(2022)',...,,,,,,,,,,
2,'Inwi','Private company',,'Telecommunications',"""April\xa02009 ; 14\xa0years ago ( 2009-04 ) (...","'Casablanca , Morocco'","'Nadia Fassi Fehri, Chief Executive Officer (2...","'Fixed phones, Mobile phone lines, Digital Tel...",,,...,,,,,,,,,,
3,'Akwa Group S.A.','Société anonyme',,,'1932 ( 1932 )',"'Casablanca , Morocco'",'Aziz Akhannouch Ali Wakrim Jamal Wakrim',,'$3 billion',,...,,,,,,,,,,
4,'Compagnie Marocaine de Navigation','Subsidiary',,'Transport','1946',"'Casablanca , Morocco'",'Taoufik Ibrahimi (PDG)','Ferries Port services Passenger transportatio...,,,...,,,,,,,,,,


In [None]:
pip install --upgrade nlpia2-wikipedia --quiet

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.0/13.0 MB[0m [31m21.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m249.4/249.4 kB[0m [31m22.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m345.4/345.4 kB[0m [31m26.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m78.7/78.7 kB[0m [31m8.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m514.7/514.7 kB[0m [31m32.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m453.8/453.8 kB[0m [31m37.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m309.5/309.5 kB[0m [31m25.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m54.5/54.5 kB[0m [31m4.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━

In [None]:
df['Company'].value_counts()

'Maroc Telecom'                        1
'OCP Group'                            1
'2M TV'                                1
'Les Domaines Agricoles'               1
'Casablanca Stock Exchange (CSE)'      1
'BMCI'                                 1
'Casa Air Service'                     1
'Jet4you'                              1
'Comarit'                              1
'FRS Iberia Maroc'                     1
'Air Arabia Maroc'                     1
'Laraki Automobiles SA'                1
'Compagnie de Transports au Maroc'     1
'Eco-Médias'                           1
'Compagnie Générale Immobiliere'       1
'Banque Centrale Populaire'            1
'Agma'                                 1
'FerriMaroc'                           1
'Compagnie Marocaine de Navigation'    1
'Akwa Group S.A.'                      1
'Inwi'                                 1
'Attijariwafa bank'                    1
Name: Company, dtype: int64

In [None]:
# Remove single quotes from all elements in the DataFrame
df = df.applymap(lambda x: x.replace("'", ""))


In [None]:
df.head()

Unnamed: 0,Company,Company type,Traded as,Industry,Founded,Headquarters,Key people,Products,Revenue,Net income,...,Launched,Links,Availability,Terrestrial,DTT (Morocco),Streaming media,CASSETTE Audio Video Visual,Ziggo GO,Number of locations,Total equity
0,Maroc Telecom,Public,Euronext Paris : IAM,Telecommunications,"February\xa03, 1998 ; 26 years ago ( 1998-02-03 )","Rabat , Morocco","Abdeslam Ahizoune , Chairman & CEO Laurent Mai...","Landline phones , Mobile phone lines , Fiber-o...","US$ 3,6 billion (2018)",US$ 610 million (2018),...,,,,,,,,,,
1,OCP Group,,,"Phosphates, Chemicals",1920,"Casablanca , Morocco",Mostafa Terrab (Chairman),,US$11 \xa0billion\xa0(2022),US$1.6 \xa0 billion dollars\xa0(2022),...,,,,,,,,,,
2,Inwi,Private company,,Telecommunications,"""April\xa02009 ; 14\xa0years ago ( 2009-04 ) (...","Casablanca , Morocco","Nadia Fassi Fehri, Chief Executive Officer (20...","Fixed phones, Mobile phone lines, Digital Tele...",,,...,,,,,,,,,,
3,Akwa Group S.A.,Société anonyme,,,1932 ( 1932 ),"Casablanca , Morocco",Aziz Akhannouch Ali Wakrim Jamal Wakrim,,$3 billion,,...,,,,,,,,,,
4,Compagnie Marocaine de Navigation,Subsidiary,,Transport,1946,"Casablanca , Morocco",Taoufik Ibrahimi (PDG),Ferries Port services Passenger transportation...,,,...,,,,,,,,,,


In [None]:
import wikipedia

# Define a function to get the Wikipedia summary for a company
def get_wiki_summary(company):
    try:
        # Get the Wikipedia page for the company
        page = wikipedia.page(company)

        # Return the summary of the Wikipedia page
        return page.summary
    except (wikipedia.exceptions.DisambiguationError,
            wikipedia.exceptions.PageError,
            KeyError):
        # If the company name is ambiguous, the Wikipedia page does not exist,
        # or there is an error retrieving the page, return None
        return None

# Apply the get_wiki_summary function to each company in the DataFrame
df['description'] = df['Company'].apply(get_wiki_summary)


In [None]:
df.head(n=3)

Unnamed: 0,Company,Company type,Traded as,Industry,Founded,Headquarters,Key people,Products,Revenue,Net income,...,Links,Availability,Terrestrial,DTT (Morocco),Streaming media,CASSETTE Audio Video Visual,Ziggo GO,Number of locations,Total equity,description
0,Maroc Telecom,Public,Euronext Paris : IAM,Telecommunications,"February\xa03, 1998 ; 26 years ago ( 1998-02-03 )","Rabat , Morocco","Abdeslam Ahizoune , Chairman & CEO Laurent Mai...","Landline phones , Mobile phone lines , Fiber-o...","US$ 3,6 billion (2018)",US$ 610 million (2018),...,,,,,,,,,,"Maroc Telecom (Acronym: IAM, Arabic: اتصالات ا..."
1,OCP Group,,,"Phosphates, Chemicals",1920,"Casablanca , Morocco",Mostafa Terrab (Chairman),,US$11 \xa0billion\xa0(2022),US$1.6 \xa0 billion dollars\xa0(2022),...,,,,,,,,,,The OCP Group (OCP S.A.) (formerly Office Chér...
2,Inwi,Private company,,Telecommunications,"""April\xa02009 ; 14\xa0years ago ( 2009-04 ) (...","Casablanca , Morocco","Nadia Fassi Fehri, Chief Executive Officer (20...","Fixed phones, Mobile phone lines, Digital Tele...",,,...,,,,,,,,,,Inwi (Arabic: إنوي) (formerly known as Wana) ...


In [None]:
# Select only the Company and description columns
df_subset = df[['Company', 'description']]

# Print the subset of the DataFrame
df_subset


Unnamed: 0,Company,description
0,Maroc Telecom,"Maroc Telecom (Acronym: IAM, Arabic: اتصالات ا..."
1,OCP Group,The OCP Group (OCP S.A.) (formerly Office Chér...
2,Inwi,Inwi (Arabic: إنوي) (formerly known as Wana) ...
3,Akwa Group S.A.,Akwa Group S.A. is a conglomerate company head...
4,Compagnie Marocaine de Navigation,The Compagnie Marocaine de Navigation or Coman...
5,FerriMaroc,FerriMaroc was a Moroccan ferry company which ...
6,Agma,"The voiced velar nasal, also known as agma, fr..."
7,Banque Centrale Populaire,Banque Centrale Populaire is a major bank in M...
8,Compagnie Générale Immobiliere,"Compagnie Générale Immobiliere, or CGI is a Mo..."
9,Eco-Médias,Eco-Médias (Arabic: ايكوميديا) is a Moroccan m...


In [None]:
# Insert the description column after the Company type column
df.insert(2, 'description', df.pop('description'))

# Print the reordered DataFrame
df


Unnamed: 0,Company,Company type,description,Traded as,Industry,Founded,Headquarters,Key people,Products,Revenue,...,Launched,Links,Availability,Terrestrial,DTT (Morocco),Streaming media,CASSETTE Audio Video Visual,Ziggo GO,Number of locations,Total equity
0,Maroc Telecom,Public,"Maroc Telecom (Acronym: IAM, Arabic: اتصالات ا...",Euronext Paris : IAM,Telecommunications,"February\xa03, 1998 ; 26 years ago ( 1998-02-03 )","Rabat , Morocco","Abdeslam Ahizoune , Chairman & CEO Laurent Mai...","Landline phones , Mobile phone lines , Fiber-o...","US$ 3,6 billion (2018)",...,,,,,,,,,,
1,OCP Group,,The OCP Group (OCP S.A.) (formerly Office Chér...,,"Phosphates, Chemicals",1920,"Casablanca , Morocco",Mostafa Terrab (Chairman),,US$11 \xa0billion\xa0(2022),...,,,,,,,,,,
2,Inwi,Private company,Inwi (Arabic: إنوي) (formerly known as Wana) ...,,Telecommunications,"""April\xa02009 ; 14\xa0years ago ( 2009-04 ) (...","Casablanca , Morocco","Nadia Fassi Fehri, Chief Executive Officer (20...","Fixed phones, Mobile phone lines, Digital Tele...",,...,,,,,,,,,,
3,Akwa Group S.A.,Société anonyme,Akwa Group S.A. is a conglomerate company head...,,,1932 ( 1932 ),"Casablanca , Morocco",Aziz Akhannouch Ali Wakrim Jamal Wakrim,,$3 billion,...,,,,,,,,,,
4,Compagnie Marocaine de Navigation,Subsidiary,The Compagnie Marocaine de Navigation or Coman...,,Transport,1946,"Casablanca , Morocco",Taoufik Ibrahimi (PDG),Ferries Port services Passenger transportation...,,...,,,,,,,,,,
5,FerriMaroc,,FerriMaroc was a Moroccan ferry company which ...,,,November 1994,"Nador , Morocco",,,,...,,,,,,,,,,
6,Agma,,"The voiced velar nasal, also known as agma, fr...",,Finance/Insurance,,"Casablanca , Morocco",Rachida Benabdallah (Managing Director),,MAD123.50 million (2013),...,,,,,,,,,,
7,Banque Centrale Populaire,,Banque Centrale Populaire is a major bank in M...,,Finance,,"Casablanca , Morocco",,Financial services,"DH 20,1 billion (2021)",...,,,,,,,,,,
8,Compagnie Générale Immobiliere,Subsidiary,"Compagnie Générale Immobiliere, or CGI is a Mo...",,Real estate,1960 ; 64\xa0years ago ( 1960 ),,,,,...,,,,,,,,,,
9,Eco-Médias,Société Anonyme,Eco-Médias (Arabic: ايكوميديا) is a Moroccan m...,,Media,1991 ; 33\xa0years ago ( 1991 ),"Casablanca , Morocco",Marie-Thérèse Bourrut Abdelmounaïm Dilami,,,...,,,,,,,,,,


In [None]:
print(df.columns.tolist())

['Company', 'Company type', 'description', 'Traded as', 'Industry', 'Founded', 'Headquarters', 'Key people', 'Products', 'Revenue', 'Net income', 'Owners', 'Number of employees', 'Website', 'Owner', 'Subsidiaries', 'Founder', 'Parent', 'Fate', 'Area served', 'Services', 'Total assets', 'Native name', 'Commenced operations', 'Operating bases', 'Secondary hubs', 'Fleet size', 'Destinations', 'Parent company', 'Defunct', 'Ceased operations', 'Focus cities', 'Hubs', 'ISIN', 'Type', 'Location', 'Key\xa0people', 'Currency', 'Market cap', 'Volume', 'Indices', 'Country', 'Broadcast area', 'Programming', 'Language(s)', 'Picture format', 'Ownership', 'History', 'Launched', 'Links', 'Availability', 'Terrestrial', 'DTT (Morocco)', 'Streaming media', 'CASSETTE Audio Video Visual', 'Ziggo GO', 'Number of locations', 'Total equity']


In [None]:
df

Unnamed: 0,Company,Company type,description,Traded as,Industry,Founded,Headquarters,Key people,Products,Revenue,...,Launched,Links,Availability,Terrestrial,DTT (Morocco),Streaming media,CASSETTE Audio Video Visual,Ziggo GO,Number of locations,Total equity
0,Maroc Telecom,Public,"Maroc Telecom (Acronym: IAM, Arabic: اتصالات ا...",Euronext Paris : IAM,Telecommunications,"February\xa03, 1998 ; 26 years ago ( 1998-02-03 )","Rabat , Morocco","Abdeslam Ahizoune , Chairman & CEO Laurent Mai...","Landline phones , Mobile phone lines , Fiber-o...","US$ 3,6 billion (2018)",...,,,,,,,,,,
1,OCP Group,,The OCP Group (OCP S.A.) (formerly Office Chér...,,"Phosphates, Chemicals",1920,"Casablanca , Morocco",Mostafa Terrab (Chairman),,US$11 \xa0billion\xa0(2022),...,,,,,,,,,,
2,Inwi,Private company,Inwi (Arabic: إنوي) (formerly known as Wana) ...,,Telecommunications,"""April\xa02009 ; 14\xa0years ago ( 2009-04 ) (...","Casablanca , Morocco","Nadia Fassi Fehri, Chief Executive Officer (20...","Fixed phones, Mobile phone lines, Digital Tele...",,...,,,,,,,,,,
3,Akwa Group S.A.,Société anonyme,Akwa Group S.A. is a conglomerate company head...,,,1932 ( 1932 ),"Casablanca , Morocco",Aziz Akhannouch Ali Wakrim Jamal Wakrim,,$3 billion,...,,,,,,,,,,
4,Compagnie Marocaine de Navigation,Subsidiary,The Compagnie Marocaine de Navigation or Coman...,,Transport,1946,"Casablanca , Morocco",Taoufik Ibrahimi (PDG),Ferries Port services Passenger transportation...,,...,,,,,,,,,,
5,FerriMaroc,,FerriMaroc was a Moroccan ferry company which ...,,,November 1994,"Nador , Morocco",,,,...,,,,,,,,,,
6,Agma,,"The voiced velar nasal, also known as agma, fr...",,Finance/Insurance,,"Casablanca , Morocco",Rachida Benabdallah (Managing Director),,MAD123.50 million (2013),...,,,,,,,,,,
7,Banque Centrale Populaire,,Banque Centrale Populaire is a major bank in M...,,Finance,,"Casablanca , Morocco",,Financial services,"DH 20,1 billion (2021)",...,,,,,,,,,,
8,Compagnie Générale Immobiliere,Subsidiary,"Compagnie Générale Immobiliere, or CGI is a Mo...",,Real estate,1960 ; 64\xa0years ago ( 1960 ),,,,,...,,,,,,,,,,
9,Eco-Médias,Société Anonyme,Eco-Médias (Arabic: ايكوميديا) is a Moroccan m...,,Media,1991 ; 33\xa0years ago ( 1991 ),"Casablanca , Morocco",Marie-Thérèse Bourrut Abdelmounaïm Dilami,,,...,,,,,,,,,,


In [None]:
# Create a new column to store the combined values
df['Combined_Key_People'] = ''

# Iterate through the rows of the dataframe
for index, row in df.iterrows():
    # Check if the value in the 'Key people' column is not null
    if not pd.isnull(row['Key people']):
        # Append the value to the 'Combined_Key_People' column
        df.at[index, 'Combined_Key_People'] += row['Key people']
    else:
        # Append the value from the 'Key\xa0people' column
        df.at[index, 'Combined_Key_People'] += row['Key\xa0people']

# Drop the original columns
df.drop(['Key people', 'Key\xa0people'], axis=1, inplace=True)

In [None]:
print(df.columns.tolist())

['Company', 'Company type', 'description', 'Traded as', 'Industry', 'Founded', 'Headquarters', 'Products', 'Revenue', 'Net income', 'Owners', 'Number of employees', 'Website', 'Owner', 'Subsidiaries', 'Founder', 'Parent', 'Fate', 'Area served', 'Services', 'Total assets', 'Native name', 'Commenced operations', 'Operating bases', 'Secondary hubs', 'Fleet size', 'Destinations', 'Parent company', 'Defunct', 'Ceased operations', 'Focus cities', 'Hubs', 'ISIN', 'Type', 'Location', 'Currency', 'Market cap', 'Volume', 'Indices', 'Country', 'Broadcast area', 'Programming', 'Language(s)', 'Picture format', 'Ownership', 'History', 'Launched', 'Links', 'Availability', 'Terrestrial', 'DTT (Morocco)', 'Streaming media', 'CASSETTE Audio Video Visual', 'Ziggo GO', 'Number of locations', 'Total equity', 'Combined_Key_People']


In [None]:
df.head()

Unnamed: 0,Company,Company type,description,Traded as,Industry,Founded,Headquarters,Products,Revenue,Net income,...,Links,Availability,Terrestrial,DTT (Morocco),Streaming media,CASSETTE Audio Video Visual,Ziggo GO,Number of locations,Total equity,Combined_Key_People
0,Maroc Telecom,Public,"Maroc Telecom (Acronym: IAM, Arabic: اتصالات ا...",Euronext Paris : IAM,Telecommunications,"February\xa03, 1998 ; 26 years ago ( 1998-02-03 )","Rabat , Morocco","Landline phones , Mobile phone lines , Fiber-o...","US$ 3,6 billion (2018)",US$ 610 million (2018),...,,,,,,,,,,"Abdeslam Ahizoune , Chairman & CEO Laurent Mai..."
1,OCP Group,,The OCP Group (OCP S.A.) (formerly Office Chér...,,"Phosphates, Chemicals",1920,"Casablanca , Morocco",,US$11 \xa0billion\xa0(2022),US$1.6 \xa0 billion dollars\xa0(2022),...,,,,,,,,,,Mostafa Terrab (Chairman)
2,Inwi,Private company,Inwi (Arabic: إنوي) (formerly known as Wana) ...,,Telecommunications,"""April\xa02009 ; 14\xa0years ago ( 2009-04 ) (...","Casablanca , Morocco","Fixed phones, Mobile phone lines, Digital Tele...",,,...,,,,,,,,,,"Nadia Fassi Fehri, Chief Executive Officer (20..."
3,Akwa Group S.A.,Société anonyme,Akwa Group S.A. is a conglomerate company head...,,,1932 ( 1932 ),"Casablanca , Morocco",,$3 billion,,...,,,,,,,,,,Aziz Akhannouch Ali Wakrim Jamal Wakrim
4,Compagnie Marocaine de Navigation,Subsidiary,The Compagnie Marocaine de Navigation or Coman...,,Transport,1946,"Casablanca , Morocco",Ferries Port services Passenger transportation...,,,...,,,,,,,,,,Taoufik Ibrahimi (PDG)


In [None]:
# Rename the 'Combined_Key_People' column to 'Influential People'
df.rename(columns={'Combined_Key_People': 'Influential People'}, inplace=True)

In [None]:
# Insert the Influential People column after the Company description column
df.insert(3, 'Influential People', df.pop('Influential People'))

# Print the reordered DataFrame
df.head(n=3)

Unnamed: 0,Company,Company type,description,Influential People,Traded as,Industry,Founded,Headquarters,Products,Revenue,...,Launched,Links,Availability,Terrestrial,DTT (Morocco),Streaming media,CASSETTE Audio Video Visual,Ziggo GO,Number of locations,Total equity
0,Maroc Telecom,Public,"Maroc Telecom (Acronym: IAM, Arabic: اتصالات ا...","Abdeslam Ahizoune , Chairman & CEO Laurent Mai...",Euronext Paris : IAM,Telecommunications,"February\xa03, 1998 ; 26 years ago ( 1998-02-03 )","Rabat , Morocco","Landline phones , Mobile phone lines , Fiber-o...","US$ 3,6 billion (2018)",...,,,,,,,,,,
1,OCP Group,,The OCP Group (OCP S.A.) (formerly Office Chér...,Mostafa Terrab (Chairman),,"Phosphates, Chemicals",1920,"Casablanca , Morocco",,US$11 \xa0billion\xa0(2022),...,,,,,,,,,,
2,Inwi,Private company,Inwi (Arabic: إنوي) (formerly known as Wana) ...,"Nadia Fassi Fehri, Chief Executive Officer (20...",,Telecommunications,"""April\xa02009 ; 14\xa0years ago ( 2009-04 ) (...","Casablanca , Morocco","Fixed phones, Mobile phone lines, Digital Tele...",,...,,,,,,,,,,


In [None]:
# Specify the output file path
output_file = './output.json'

# Export the DataFrame as a JSON file
df.to_json(output_file, orient='records')


In [None]:
import pandas as pd

# Load the DataFrame from the output.json file
df = pd.read_json('output.json')

# Drop columns that have only null values
df = df.dropna(axis=1, how='all')

# Export the modified DataFrame to a new JSON file
df.to_json('output_new.json', orient='records', indent=4)

In [None]:
df.head()

Unnamed: 0,Company,Company type,description,Influential People,Traded as,Industry,Founded,Headquarters,Products,Revenue,...,Launched,Links,Availability,Terrestrial,DTT (Morocco),Streaming media,CASSETTE Audio Video Visual,Ziggo GO,Number of locations,Total equity
0,Maroc Telecom,Public,"Maroc Telecom (Acronym: IAM, Arabic: اتصالات ا...","Abdeslam Ahizoune , Chairman & CEO Laurent Mai...",Euronext Paris : IAM,Telecommunications,"February\xa03, 1998 ; 26 years ago ( 1998-02-03 )","Rabat , Morocco","Landline phones , Mobile phone lines , Fiber-o...","US$ 3,6 billion (2018)",...,,,,,,,,,,
1,OCP Group,,The OCP Group (OCP S.A.) (formerly Office Chér...,Mostafa Terrab (Chairman),,"Phosphates, Chemicals",1920,"Casablanca , Morocco",,US$11 \xa0billion\xa0(2022),...,,,,,,,,,,
2,Inwi,Private company,Inwi (Arabic: إنوي) (formerly known as Wana) ...,"Nadia Fassi Fehri, Chief Executive Officer (20...",,Telecommunications,"""April\xa02009 ; 14\xa0years ago ( 2009-04 ) (...","Casablanca , Morocco","Fixed phones, Mobile phone lines, Digital Tele...",,...,,,,,,,,,,
3,Akwa Group S.A.,Société anonyme,Akwa Group S.A. is a conglomerate company head...,Aziz Akhannouch Ali Wakrim Jamal Wakrim,,,1932 ( 1932 ),"Casablanca , Morocco",,$3 billion,...,,,,,,,,,,
4,Compagnie Marocaine de Navigation,Subsidiary,The Compagnie Marocaine de Navigation or Coman...,Taoufik Ibrahimi (PDG),,Transport,1946,"Casablanca , Morocco",Ferries Port services Passenger transportation...,,...,,,,,,,,,,
