# Python Interview Questions (71-80)

---



### Q71. Write code to insert commas between characters of all elements in an array.


If you have an array of strings and you want to insert commas between the characters of each element, you can use a list comprehension along with the join() method.

Example:

In [1]:
import numpy as np

# Create an array of strings
arr = np.array(['abc', 'def', 'ghi'])

# Insert commas between characters of each element
result = np.array([','.join(list(word)) for word in arr])

print(result)

['a,b,c' 'd,e,f' 'g,h,i']


In this example, ','.join(list(word)) is used inside a list comprehension to insert commas between the characters of each string in the array. The result is a new array with the modified strings.

Note that if your array contains elements of different lengths, the resulting array will have variable-length strings, and elements with more characters will have more commas. If you want to align the commas, you may need additional processing or padding.

If you have a NumPy array of integers or floats and you want to convert each element to a string before inserting commas, you can use the astype(str) method:

In [2]:
import numpy as np

# Create an array of integers
arr = np.array([123, 456, 789])

# Insert commas between digits of each element
result = np.array([','.join(list(str(num))) for num in arr])

print(result)

['1,2,3' '4,5,6' '7,8,9']


In this case, ','.join(list(str(num))) converts each number to a string before inserting commas between its digits.

### Q72. How can you add 1 to all sides of an existing array?


If you want to add 1 to all sides (borders) of an existing array, you can use the NumPy function np.pad(). This function pads an array with a constant value or values along each dimension. The mode parameter allows you to specify the padding mode.

An example of adding 1 to all sides of a 2D array:

In [3]:
import numpy as np

# Create a 2D array
arr = np.array([[2, 3, 4],
                [5, 6, 7]])

# Pad the array with 1 on all sides
padded_arr = np.pad(arr, pad_width=1, constant_values=1)

print("Original array:")
print(arr)

print("\nArray with 1 added to all sides:")
print(padded_arr)

Original array:
[[2 3 4]
 [5 6 7]]

Array with 1 added to all sides:
[[1 1 1 1 1]
 [1 2 3 4 1]
 [1 5 6 7 1]
 [1 1 1 1 1]]


In this example, np.pad() is used with pad_width=1 to add a border of 1 to all sides of the original array. The constant_values=1 parameter specifies that the padding values should be 1.

You can customize the padding values and widths according to your requirements. The mode parameter in np.pad() allows you to choose different padding modes, such as 'constant', 'edge', 'symmetric', etc. The example uses 'constant' mode, where the padding values are constant, but you can explore other modes based on your needs.

### Q73. How can we swap axes of a numpy array?


You can swap axes of a NumPy array using the numpy.swapaxes() method or the numpy.transpose() function. Both methods allow you to rearrange the dimensions of the array. Here's an example using both approaches:

In [4]:
import numpy as np

# Create a 2D array
arr = np.array([[1, 2, 3],
                [4, 5, 6]])

# Using swapaxes()
swapped_arr = arr.swapaxes(0, 1)

print("Original array:")
print(arr)

print("\nArray with swapped axes using swapaxes():")
print(swapped_arr)

# Using transpose()
transposed_arr = np.transpose(arr)

print("\nArray with swapped axes using transpose():")
print(transposed_arr)

Original array:
[[1 2 3]
 [4 5 6]]

Array with swapped axes using swapaxes():
[[1 4]
 [2 5]
 [3 6]]

Array with swapped axes using transpose():
[[1 4]
 [2 5]
 [3 6]]


In the example, the swapaxes() method is used to swap the axes along dimensions 0 and 1, effectively transposing the array. The transpose() function achieves the same result. Note that in a 2D array, swapping axes 0 and 1 is equivalent to transposing the array.

Choose the method that you find more convenient or fits better with your code. If you need to transpose more than two axes or perform more complex operations, transpose() might be more versatile, as it allows you to specify the order of the axes explicitly.

### Q74. How to get the indices of n maximum values in a given array?

You can get the indices of the n maximum values in a NumPy array using the numpy.argsort() function. The argsort() function returns the indices that would sort an array, and you can use it to find the indices of the largest values.

Example:

In [5]:
import numpy as np

# Create an array
arr = np.array([4, 2, 8, 1, 7, 6])

# Get the indices of the 3 maximum values
n = 3
indices_of_max_values = np.argsort(arr)[-n:]

print("Original array:")
print(arr)

print(f"\nIndices of the {n} maximum values:")
print(indices_of_max_values)

Original array:
[4 2 8 1 7 6]

Indices of the 3 maximum values:
[5 4 2]


In this example, np.argsort(arr) returns the indices that would sort the array in ascending order. By using [-n:], we select the last n indices, which correspond to the indices of the n maximum values in the original array.

If you also want to get the actual values of the maximum elements, you can use the obtained indices to index into the array:

In [6]:
values_of_max_values = arr[indices_of_max_values]

print(f"\nValues of the {n} maximum values:")
print(values_of_max_values)


Values of the 3 maximum values:
[6 7 8]


This approach works for both 1D and multi-dimensional arrays. If you have a multi-dimensional array and want to find the indices along a specific axis, you can use the numpy.unravel_index() function along with argsort().

### Q75. What is categorical data in pandas?


In pandas, categorical data is a data type that represents categorical variables or factors. Categorical data is used to store data that can take on a limited, fixed number of distinct categories or levels. This type of data is often used to represent qualitative data or nominal data.

Categorical data in pandas is implemented using the Categorical data type. It provides a way to efficiently store and perform operations on data with a limited set of unique values. This can be especially useful when dealing with large datasets where the same set of categories is repeated.

Key characteristics of categorical data in pandas:

1. Limited, Fixed Set of Categories:

- Categorical data represents a fixed and limited set of categories or levels.

2. Ordinal or Nominal:

- Categorical data can be either ordinal or nominal.
- Ordinal categorical data has a meaningful order or hierarchy among categories.
- Nominal categorical data has no inherent order.

3. Efficient Storage and Operations:

- Categorical data is stored more efficiently than plain object data types, which can lead to memory and performance improvements, especially for large datasets.

4. Supported Operations:

- Categorical data supports various operations, including sorting, ordering, and grouping.

An example of how to create and use categorical data in pandas:

In [7]:
import pandas as pd

# Create a pandas Series with categorical data
data = pd.Series(['cat', 'dog', 'dog', 'bird', 'cat'], dtype='category')

# Display the categorical data
print(data)

0     cat
1     dog
2     dog
3    bird
4     cat
dtype: category
Categories (3, object): ['bird', 'cat', 'dog']


In this example, the dtype='category' argument is used to create a categorical Series. The categories are automatically inferred from the unique values in the data. The output also shows the distinct categories and their order.

Categorical data is particularly useful when you have a column with a limited set of values that are repeated, and you want to save memory and improve performance by representing them as categories.

### Q76. How can we transform a true/false value to 1/0 in a dataframe?


To transform True/False values to 1/0 in a DataFrame, you can use the astype(int) method along with boolean indexing. This allows you to convert the boolean values to integers (1 for True and 0 for False).

Example:

In [8]:
import pandas as pd

# Create a sample DataFrame with boolean values
data = {'A': [True, False, True],
        'B': [False, True, False]}
df = pd.DataFrame(data)

# Display the original DataFrame
print("Original DataFrame:")
print(df)

# Transform True/False to 1/0
df = df.astype(int)

# Display the DataFrame after transformation
print("\nDataFrame after transformation:")
print(df)

Original DataFrame:
       A      B
0   True  False
1  False   True
2   True  False

DataFrame after transformation:
   A  B
0  1  0
1  0  1
2  1  0


In this example, df.astype(int) is used to convert the DataFrame to integers. This method works well when you want to transform all boolean columns in the DataFrame.

If you want to convert specific columns, you can use boolean indexing and assign the converted values to those columns:

In [9]:
import pandas as pd

# Create a sample DataFrame with boolean values
data = {'A': [True, False, True],
        'B': [False, True, False]}
df = pd.DataFrame(data)

# Display the original DataFrame
print("Original DataFrame:")
print(df)

# Transform 'A' column (True/False to 1/0)
df['A'] = df['A'].astype(int)

# Display the DataFrame after transformation
print("\nDataFrame after transformation:")
print(df)

Original DataFrame:
       A      B
0   True  False
1  False   True
2   True  False

DataFrame after transformation:
   A      B
0  1  False
1  0   True
2  1  False


In this example, only the 'A' column is transformed from True/False to 1/0. Adjust the column selection based on your specific requirements.

### Q77. How are loc() and iloc() different?


In pandas, loc[] and iloc[] are two methods used for selecting data from a DataFrame, but they have different ways of indexing and selecting data.

##### loc[] (label-based selection):
- loc[] is primarily label-based indexing, meaning it is used to select data based on labels or index names.
- It takes two arguments: the row labels and column labels.
- The arguments can be single labels, lists of labels, or slices. The slicing is inclusive of both endpoints.
- When using loc[], the labels are used explicitly, and both the start and stop index are included in the selection.

Example:

In [10]:
import pandas as pd

# Create a DataFrame
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}, index=['row1', 'row2', 'row3'])

# Using loc to select data
selection = df.loc[['row1', 'row2'], ['A', 'B']]
print(selection)

      A  B
row1  1  4
row2  2  5


##### iloc[] (integer-location based selection):
- iloc[] is primarily integer-location based indexing, meaning it is used to select data based on integer positions.
- It takes two arguments: the row indices and column indices.
- The arguments can be integers, lists of integers, or slices. The slicing is exclusive of the stop index.
- When using iloc[], the integer positions are used for selection, and the stop index is not included in the selection.

Example:

In [11]:
import pandas as pd

# Create a DataFrame
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})

# Using iloc to select data
selection = df.iloc[0:2, 0:2]
print(selection)

   A  B
0  1  4
1  2  5


##### In summary:

- Use loc[] when you want to select data based on labels (row and column names).
- Use iloc[] when you want to select data based on integer positions (row and column indices).
- Both methods support single labels or indices, lists of labels or indices, and slices.

### Q78. How do you sort a dataframe by two columns?

You can sort a DataFrame by two columns in pandas using the sort_values() method. The sort_values() method allows you to specify one or more columns by which the DataFrame should be sorted. To sort by multiple columns, pass a list of column names to the by parameter.

Example:

In [12]:
import pandas as pd

# Create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
        'Age': [25, 22, 30, 28, 23],
        'Score': [95, 89, 75, 82, 97]}

df = pd.DataFrame(data)

# Sort the DataFrame by 'Age' in ascending order and then by 'Score' in descending order
sorted_df = df.sort_values(by=['Age', 'Score'], ascending=[True, False])

# Display the sorted DataFrame
print(sorted_df)

      Name  Age  Score
1      Bob   22     89
4      Eva   23     97
0    Alice   25     95
3    David   28     82
2  Charlie   30     75


In this example, the sort_values() method is used to sort the DataFrame first by the 'Age' column in ascending order (ascending=True) and then by the 'Score' column in descending order (ascending=False). The resulting DataFrame is sorted based on the specified column order.

You can customize the sorting order for each column by adjusting the ascending parameter accordingly. In the example, the 'Age' column is sorted in ascending order (ascending=True), and the 'Score' column is sorted in descending order (ascending=False).

### Q79. Find the row which has the maximum value of a given column.

To find the row that has the maximum value in a given column of a pandas DataFrame, you can use the idxmax() function along with boolean indexing.

Example:

In [13]:
import pandas as pd

# Create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
        'Age': [25, 22, 30, 28, 23],
        'Score': [95, 89, 75, 82, 97]}

df = pd.DataFrame(data)

# Specify the column for which you want to find the maximum value
column_name = 'Score'

# Find the row with the maximum value in the specified column
max_row = df.loc[df[column_name].idxmax()]

# Display the result
print(f"Row with the maximum value in column '{column_name}':")
print(max_row)

Row with the maximum value in column 'Score':
Name     Eva
Age       23
Score     97
Name: 4, dtype: object


In this example, df[column_name].idxmax() returns the index of the maximum value in the specified column ('Score' in this case). Using this index, df.loc[] retrieves the corresponding row.

**Note:** If there are multiple rows with the maximum value in the specified column, idxmax() returns the index of the first occurrence of the maximum value.

You can customize the column_name variable to find the row with the maximum value in any column of interest.

### Q80. How can you split a column which contains strings into multiple columns?


You can split a column containing strings into multiple columns in pandas using the str.split() method. The str.split() method is used to split each element of a column into a list of substrings based on a specified delimiter. You can then expand these lists into multiple columns using the expand=True parameter.

Example:

In [14]:
import pandas as pd

# Create a sample DataFrame
data = {'Name': ['John Doe', 'Jane Smith', 'Bob Johnson']}
df = pd.DataFrame(data)

# Split the 'Name' column into multiple columns
df[['First_Name', 'Last_Name']] = df['Name'].str.split(' ', expand=True)

# Display the result
print(df)

          Name First_Name Last_Name
0     John Doe       John       Doe
1   Jane Smith       Jane     Smith
2  Bob Johnson        Bob   Johnson


In this example, the 'Name' column is split into two columns ('First_Name' and 'Last_Name') using the space character as the delimiter. The str.split(' ', expand=True) method splits the strings into a list of substrings based on the space character and expands them into separate columns.

You can customize the delimiter by providing a different character or string inside the str.split() method. Additionally, you can choose the number of splits using the n parameter if you want to split the string into a specific number of parts.

In [None]:
# Example with a different delimiter
df[['First_Name', 'Last_Name']] = df['Name'].str.split('_', expand=True)

# Example with a specific number of splits
df[['First_Name', 'Last_Name']] = df['Name'].str.split('_', n=1, expand=True)

Adjust the code based on your specific requirements and the structure of the strings in your column.